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A  knowledge  base  management  system  (KBMS)  represents  an  innovative  technology 
necessary  for  supporting  applications  that  require  knowledge  to  reason  about  large  quantities 
of  data.  The  integration  of  artificial  intelligence  and  database  management  technologies  is 
critical  to  the  success  of  KBMS  technology  and  is  the  focus  of  our  research.  A  design  metho- 
dology for  the  integration  process  and  several  implementation  techniques  for  a  knowledge 
base  management  system  (KBMS)  have  been  studied  and  the  results  are  presented  in  this  < 

dissertation. 

Three   important  elements  for   integration   are   identified   in  our  research.    First,   an  ' 

object-oriented  knowledge  representation  model  is  used  to  define  object  types  by  (a)  their  : 

structural  relationships  with  other  object  types,  (b)  operations  that  are  executed  against  the        -'     !l: 
occurrences  of  the  object  type  and  (c)  rule-based  knowledge  that  captures  constraints,  rules 
of  inference,  expert  knowledge,  etc.,  relevant  to  the  object  type  and  its  occurrences.    Thus,  -'  '    '■. 

the  model  integrates  facts  and  rules  within  the  object  types  of  the  knowledge  base.   Next,  we 
use   the  constructs  of  a  single  knowledge   manipulation   language  (KML)  to  specify  both 
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operations  and  rules  for  object  types  and  we  identify  the  constructs  of  this  KML.  The  third 
element  is  a  mechanism  for  applying  the  rule-based  knowledge  while  compiling  and  executing 
a  KBMS  transaction  against  the  knowledge  base.  A  match-modify-execute  (MME)  cycle  uses 
rules  to  modify  KML  operations  in  a  KBMS  transaction  during  compilation.  The  MME  cy- 
cle incorporates  rules  into  KBMS  transactions  during  both  compilation  and  execution.  The 
operation  of  the  MME  cycle  has  been  verified  in  a  production  system  environment. 

Our  approach  of  using  a  single  KML  to  specify  operations  and  rules  and  the  MME  cycle 
mechanism  that  executes  these  operations  and  rules  against  the  knowledge  base  supports  the 
similarity  in  database  and  rule  processing  functions  of  the  KBMS.  Consequently,  implemen- 
tation techniques  from  existing  database  technology  can  be  tailored  for  use  in  the  integrated 
KBMS  and  we  have  examined  two  techniques.  The  first  technique  is  the  interleaved  execu- 
tion of  concurrent  transactions  and  we  studied  its  effect  on  the  execution  efficiency  of  KBMS 
transactions.  The  second  technique  is  query  optimization  exploiting  decomposition,  inter- 
mediate result  sharing  and  pipelined  execution.  An  analytical  study  on  the  benefits  of  using 
these  optimization  techniques  while  evaluating  a  linear  recursive  query  is  presented  in  this 
dissertation. 
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CHAPTER  I 
INTRODUCTION 

As  individual  technologies  mature,  it  is  often  beneficial  to  integrate  technologies  either 
to  solve  problems  that  cannot  be  solved  by  each  individual  technology  or  to  solve  these  prob- 
lems more  efficiently.  The  research  presented  in  this  dissertation  relates  to  the  integration  of 
technologies  leading  to  a  new  technology,  namely  knowledge  base  management  systems. 

A  knowledge  base  management  system  (KBMS)  represents  a  new  technology  that  has 
recently  emerged  from  the  merging  of  two  existing  technologies,  namely  artificial  intelligence 
(AI)  and  database  management  systems  (DBMS).  KBMS  technology  benefits  from  the 
knowledge  representation  techniques,  the  deductive  problem  solving  capability,  the  enhanced 
query  languages,  the  explanation  facility  that  follows  a  particular  line  of  reasoning,  etc.,  sup- 
ported by  an  AI  reasoning  system  or  an  expert  system.  It  also  benefits  from  the  efficient  and 
sophisticated  management  of  a  large  database,  the  enforcement  of  reliability,  security  and 
integrity,  the  efficient  implementation  techniques  that  exploit  query  optimization  methods, 
the  use  of  parallel  processing  to  support  concurrent  execution  of  database  transactions,  etc., 
supported  by  a  DBMS.  Other  advantages  of  merging  these  technologies  include  the  use  of 
semantic  knowledge  for  query  processing  and  optimization,  the  support  of  intelligent  user 
interfaces,  etc.,  in  the  KBMS. 

Various  application  domains  that  use  expert  knowledge  when  processing  large  quanti- 
ties of  data  can  benefit  from  the  new  KBMS  technology.  Computer  aided  design  of  VLSI  cir- 
cuits and  the  engineering  design  process  in  a  manufacturing  environment  are  two  examples  of 
these  domains.  The  layout  of  a  single  VLSI  circuit  or  a  design  database  in  a  manufacturing 
environment  involves  several  megabytes  of  data.   Fragments  of  this  data  will  be  accessed  and 


modified  by  various  users  and  tools  and  it  would  require  all  of  the  features  provided  by  con- 
ventional DBMS  technology  to  manage  these  databases. 

At  the  same  time,  the  complex  process  of  design  and  testing  could  benefit  from  AI  tech- 
nology as  well.  Expert  knowledge  guiding  these  tasks  may  be  in  the  form  of  rules  and  con- 
straints. AI  reasoning  techniques  can  be  used  to  apply  the  relevant  rules  when  processing  the 
data  and  to  maintain  consistency  of  the  circuit  layout  or  the  design  data  with  respect  to  the 
constraints.  Various  heuristics  supported  by  AI  technology  can  also  be  used  to  make  these 
tasks  more  manageable  and  efficient. 

Several  approaches  for  merging  AI  and  DBMS  technologies  have  been  suggested 
[BR084,  GAL83,  JAR84b,  VAS85  and  WHA87].  One  approach  is  to  build  an  interface  or  a 
bridge  between  a  database  processor  that  manages  a  database  of  facts  or  assertions  (the 
extensional  database)  and  an  inference  processor  that  manipulates  deductive  rules  (the  inten- 
sional  database).  This  approach  is  used  in  JAR84a,  KEL82  and  KEL84  and  is  discussed  in 
Chapter  Two.  The  main  disadvantage  of  this  "separate  but  equal"  approach  to  knowledge 
management  is  the  use  of  separate  representation  schemes  for  facts  and  rules;  this  makes  it 
cumbersome  and  expensive  to  use  the  rules  for  processing  the  extensional  database. 

Other  approaches  to  implementing  a  KBMS  either  enhance  a  DBMS  with  deductive 
power  so  that  the  search  portion  of  a  reasoning  system  can  be  moved  into  the  DBMS 
[ST083,  WON84  and  ST084]  or  enhance  a  logic  programming  system  with  database  facili- 
ties [WAR84].  Although  these  approaches  may  achieve  some  of  the  necessary  functionality 
of  a  KBMS,  it  is  difficult  to  extend  an  existing  system  to  handle  requirements  outside  the  ori- 
ginal system  specifications. 

In  this  dissertation,  we  describe  our  approach  for  integrating  these  technologies  for  the 
purpose  of  designing  and  implementing  a  KBMS.  We  follow  the  object-oriented  paradigm  as 
proposed  in  KER84b,  WIE83  and  WIE84  and  discussed  in  Chapter  Two.  A  primary  feature 
of  the  KBMS  is  an  object-oriented  knowledge  representation  model  which  defines  the  struc- 
ture, operations  and  rules  for  the  object  types  of  a  single  integrated  knowledge  base. 
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Problem  solving  knowledge  is  captured  by  the  rules  defined  for  the  object  types.  The 
object  types  provide  a  natural  structuring  for  knowledge,  i.e.,  they  provide  a  binding 
between  facts  and  relevant  rules;  this  has  been  identified  as  a  desired  property  of  a  KBMS 
[WIE83  and  WIE84].  The  integrated  KBMS  also  supports  a  single  knowledge  manipulation 
language  (KML)  which  is  used  both  to  specify  operations  that  manipulate  facts  and  as  a  rule 
language  to  express  problem  solving  knowledge  relevant  to  the  facts. 

In  comparison  with  the  other  approaches,  ours  is  closest  to  the  meaning  of  integration. 
We  combine  a  DBMS  and  an  AI  rule  processing  system  into  a  single  integrated  KBMS.  This 
integration  takes  place  both  at  the  representation  and  functional  levels. 

To  elaborate,  integration  at  the  representation  level  means  that  the  KBMS  provides  a 
uniform  representation  framework  capable  of  defining  the  facts  corresponding  to  the  DBMS 
as  well  as  the  knowledge  (rules)  used  by  the  AI  reasoning  system  to  solve  problems.  Addi- 
tionally, a  single  knowledge  manipulation  language  (KML)  provides  a  unified  scheme  to 
express  both  operations  and  rules  defined  for  the  object  types. 

Integration  at  the  representation  level  leads  to  functional  integration  of  the  KBMS 
components.  The  DBMS  component  that  processes  the  operations  and  the  AI  reasoning 
component  that  processes  the  rules  use  the  same  KML  constructs  to  manipulate  the  object 
types  of  the  integrated  knowledge  base.  This  common  characterization  helps  identify  com- 
mon functionality  among  the  difl"erent  components  and  this  leads  to  functional  integration. 
Functional  integration  in  the  KBMS  eliminates  functional  redundancy  of  the  components.  It 
also  leads  to  an  efficient  implementation  of  the  KBMS  since  implementation  techniques  that 
have  been  successful  in  either  DBMS  or  AI  technology  can  be  applied  to  the  functionally 
integrated  KBMS. 

This  dissertation  is  organized  as  follows:  Chapter  Two  provides  a  brief  survey  of  the 
relevant  literature.  Chapter  Three  outlines  the  architecture  of  the  integrated  KBMS  and 
lists  its  desired  features.  We  show  how  a  technique  of  incorporating  rules  into  the  semantic 
association  model  SAM*  [SU83  and  SU85],  an  object-oriented  semantic  data  model  currently 
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under  development  at  the  University  of  Florida  Center  for  Database  Research,  is  used  to 
provide  an  object-oriented  framework  for  knowledge  representation.  Chapters  Four  through 
Seven  deal  with  several  important  aspects  in  the  design  of  the  integrated  KBMS,  as  outlined 
in  Chapter  Three. 

Chapter  Four  deals  with  the  semantics  of  the  knowledge  manipulation  language  (KML). 
We  describe  the  use  of  the  KML  to  specify  operations  that  manipulate  the  object  types  and 
to  express  rules.  We  discuss  different  categories  of  KML  constructs  and  show  how  rules 
expressed  using  these  constructs  can  explicitly  specify  declarative  and  operational  (process 
oriented  or  procedural)  semantics  of  knowledge.  We  introduce  two  main  categories  of  rules, 
namely  value  independent  rules  and  value  dependent  rules  and  discuss  their  differences  and 
the  advantages  of  classifying  rules.  We  also  discuss  how  rules  can  be  used  to  support  both 
forward  and  backward  inference  chains. 

Chapter  Five  presents  an  example  knowledge  base.  In  this  chapter,  we  describe  how 
semantic  features  found  useful  in  modeling  knowledge  can  be  captured  by  the  object  types  of 
our  knowledge  representation  model.  First  we  discuss  semantic  features,  from  both  DBMS 
and  AI  literature,  that  have  been  found  useful  in  modeling  knowledge  from  diverse  domains. 
Next,  we  describe  some  of  the  different  association  types  of  SAM*,  the  semantic  data  model 
that  we  use  in  the  design  of  our  knowledge  representation  model.  We  then  show  how  these 
association  types  and  the  rules  defined  for  them  can  be  used  to  construct  the  object  types 
which  in  turn  capture  the  previously  identified  semantic  features. 

A  mechanism  for  applying  the  rules  that  capture  problem  solving  knowledge  is  critical 
to  the  success  of  the  integrated  KBMS.  We  use  a  transaction  oriented  paradigm  to  charac- 
terize processing  in  the  KBMS.  A  transaction  is  typically  a  sequence  of  KML  operations 
which  are  executed  against  the  object  types  of  a  knowledge  base.  A  match-modify-execute 
(MME)  cycle  represents  the  mechanism  of  applying  rules,  defined  for  these  object  types, 
while  executing  this  KBMS  transaction,  and  it  is  described  in  Chapter  Six. 
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The  different  categories  of  rules  are  treated  differently  in  the  MME  cycle.  For  example, 
value  independent  rules  that  capture  operational  semantics  are  matched  against  the  transac- 
tion and  are  used  to  modify  the  transaction  prior  to  execution.  The  modifications  can  incor- 
porate further  operations  or  specify  some  operations  to  be  conditionally  executed.  In  con- 
trast, value  dependent  rules  that  capture  declarative  semantics  are  incorporated  into  the 
transaction  to  be  executed  against  the  knowledge  base;  these  rules  are  explicitly  selected  for 
execution. 

During  execution,  too,  the  transaction  can  be  modified  and  operations  or  value  depen- 
dent rules  may  be  incorporated;  this  causes  the  MME  cycle  to  be  called  recursively.  Com- 
mitting the  changes  made  to  the  knowledge  base  by  the  transaction  can  result  in  certain  con- 
ditions being  satisfied;  this  leads  to  the  implicit  selection  and  execution  of  other  value  depen- 
dent rules  that  also  capture  declarative  semantics.  Details  of  the  MME  cycle  are  discussed  in 
Chapter  Six.  We  also  describe  a  prototype  of  the  MME  cycle  implemented  in  the  0PS5  pro- 
duction system  environment  [FOR81]. 

The  OPS5  prototype  of  the  MME  cycle  highlights  several  implementation  issues  which 
lead  to  the  discussions  in  the  following  chapters.  In  Chapter  Seven,  we  first  introduce  a  per- 
formance measure  for  the  MME  cycle  implementation  and  review  available  techniques  to 
process  rules.  We  discuss  how  different  categories  of  rules  defined  for  an  object  type  can  be 
effectively  structured  to  reflect  and  exploit  differences  in  usage  of  these  rules  in  the  MME 
cycle.  We  also  discuss  how  the  context,  i.e.,  those  object  types  that  are  relevant  to  a  rule, 
can  be  determined. 

The  focus  of  Chapters  Seven  and  Eight  is  the  functional  integration  of  the  DBMS  and 
AI  reasoning  system  components  within  the  KBMS.  One  approach  to  functional  integration 
is  to  characterize  these  components  using  common  functions.  We  characterize  the  execution 
of  rules  using  DBMS  retrieval  and  storage  manipulation  functions.  Now  we  can  study  the 
migration  of  provenly  efficient  techniques  from  a  DBMS  to  the  functionally  integrated 
KBMS. 
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Chapter  Seven  deals  with  the  increased  efficiency  resulting  from  the  interleaved  execu- 
tion of  concurrent  transactions.  If  we  can  identify  and  isolate  KBMS  transactions  that  can 
be  executed  in  parallel,  then  we  can  benefit  from  concurrency  in  the  KBMS.  The  approach 
taken  is,  first,  to  identify  sources  of  parallelism  and  isolate  a  set  of  independent  transactions 
that  can  be  executed  in  parallel.  Then,  we  extend  the  serializability  criterion  for  the  correct- 
ness of  concurrent  DBMS  transactions  to  the  KBMS.  We  prove  that  the  concurrent  execu- 
tion of  a  set  of  KBMS  transactions  is  equivalent  to  a  particular  serial  execution  of  the  same 
set.  We  also  examine  DBMS  concurrency  control  algorithms,  such  as  two  phase  locking, 
from  the  viewpoint  of  KBMS  transactions. 

In  Chapter  Eight,  we  examine  ways  to  further  benefit  from  functional  integration  in  the 
KBMS  through  the  use  of  DBMS  query  optimization  techniques.  A  KBMS  transaction  that 
generates  the  transitive  closure  of  a  relation  is  an  example  of  the  use  of  a  deductive  rule  that 
generates  new  information.  Transitive  closure  is  an  example  of  a  linear  recursive  query 
which  cannot  be  evaluated  by  a  conventional  DBMS.  In  Chapter  Eight,  we  show  that  in  the 
functionally  integrated  KBMS,  the  set  of  resolvents  generated  by  the  rule  processing  system 
of  the  KBMS  using  a  linear  recursive  rule  can  be  treated  as  a  set  of  concurrent  KBMS 
retrievals.  We  then  use  query  optimization  techniques,  based  on  query  decomposition,  inter- 
mediate result  sharing  and  pipelining,  developed  for  use  in  a  DBMS,  to  evaluate  these 
retrievals  in  a  KBMS.  This  leads  to  an  efficient  evaluation  strategy  for  linear  recursive 
queries  and  illustrates  how  DBMS  strategies  can  be  used  to  support  processing  of  rules  in  an 
integrated  KBMS. 

Chapter  Nine  summarizes  the  research  presented  in  this  dissertation  and  suggests  areas 
for  future  research.  The  research  described  here  introduces  techniques  for  integrating  data- 
base management  and  artificial  intelligence  technologies  so  that  intelligent  systems  with  large 
databases  and  rule  bases  can  be  built  elegantly  to  run  efficiently.  An  object-oriented  KBMS 
can  be  used  as  the  foundation  to  model  and  build  these  intelligent  systems. 


■•.*■. 


CHAPTER  II 
A  SURVEY  OF  RELATED  WORK 

Research  in  the  integration  of  technologies  requires  surveying  several  related  subject 
areas.  In  our  study  of  KBMS  technology,  we  investigated  database  management  systems  and 
techniques,  artificial  intelligence  techniques,  knowledge  representation,  expert  systems  and 
constraint  management,  to  name  a  few.  A  complete  literature  review  would  be  considerably 
longer  than  this  entire  dissertation.  In  this  chapter,  we  survey  a  few  topics  of  general 
interest  that  have  a  bearing  on  the  design  of  an  object-oriented  KBMS.  We  delay  the  discus- 
sion of  specific  research  to  later  chapters. 

We  review  related  research  in  knowledge  representation,  the  object-oriented  program- 
ming paradigm,  the  role  of  constraints  in  knowledge  management  and  approaches  other  than 
our  own  for  integrating  DBMS  and  AI  technologies. 

2.1    Knowledge  Representation 

The  representation  of  knowledge  is  a  key  issue  in  determining  the  structure  and  organi- 
zation of  knowledge  bases  that  support  efficient  knowledge  management.  Various  knowledge 
representation  techniques  have  been  studied  in  artificial  intelligence  research  and  these 
schemes  can  be  broadly  classified  into  either  declarative  schemes  or  procedural  schemes. 

The  declarative  schemes  include  logical,  network  and  frame  based  representations.   The 

advantages  of  logical  schemes  [BR084  and  MYL81J  are  the  simple  syntax  and  well  defined 

f^  semantics  of  logical  formulae  as  well  as  the  generality  of  inference  and  proof  procedures  that 

manipulate  the  logical  formulae.    The  disadvantages  are  the  lack  of  organizational  principles 

resulting  in  unstructured  knowledge  and  the  inability  to  conveniently  express  procedural  or 
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heuristic  knowledge.  An  example  of  a  network  representation  is  a  semantic  net  [QUI68  and 
SCH76]  which  models  objects  and  the  (binary)  relationships  between  objects.  Network 
schemes  in  general  lack  the  formal  semantics  of  logical  schemes  but  have  a  natural  graphical 
representation  and  provide  a  means  for  organizing  information.  Frame  based  representation 
schemes  [WIN75J  are  used  to  model  a  stereotypical  situation  with  complex  structures  and 
provides  a  framework  for  developing  other  representation  models.  Various  adaptations  of 
frames  include  FRL  [GOL77J,  KRL  [BOB77]  and  OWL  [SZ077]. 

In  contrast,  the  procedural  or  process  oriented  knowledge  representation  schemes,  while 
lacking  in  formal  semantics,  allow  for  direct  and  efficient  interaction  between  knowledge 
facts  and  rules.  However,  this  same  interaction  results  in  meta-information  being  embedded 
in  the  control  structures  of  the  system  and  the  inability  to  easily  understand  and  modify  pro- 
cedural schemes  [KER84b  and  MYL81].  Pattern  directed  inference  systems  such  as 
PLANNER  [HEW71  and  HEW72]  and  CONNIVER  [SUS72]  are  examples  of  procedural 
knowledge  representation  systems.  These  systems  can  be  classified  on  the  basis  of  procedure 
activation  mechanisms  and  control  structures  offered.  Production  systems  [DAV75,  NEW73 
and  WAT79]  such  as  OPS5  [F0R81]  have  sometimes  been  classified  as  procedural  schemes 
[BRA85  and  MYL81].  However,  production  systems  stress  modularity  of  productions;  hence, 
they  do  not  support  direct  interaction  (communication  or  control)  between  productions. 

The  distinction  between  declarative  and  procedural  or  process  oriented  information  is 
revisited  in  MOR84.  Knowledge  and  expertise  are  perceived  as  consisting  of  representation 
and  performance  components.  Paralleling  the  distinction  made  earlier  in  MYL81  and 
STE80,  the  representation  component  contains  the  factual  information  and  statements  of 
relationships  that  define  the  desired  result  [MOR841.  This  can  be  equated  to  declarative 
semantics.  The  performance  component  then  deals  with  the  strategy  and  tactics  for  manipu- 
lating and  combining  this  information  to  achieve  results  efficiently  [MOR84].  This  can  be 
viewed  as  the  procedural  or  process  oriented  component. 
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A  complete  knowledge  representation  scheme  must  be  able  to  structure  and  organize 
knowledge.  It  must  include  a  declarative  framework  to  describe  the  relationships  existing 
between  different  knowledge  components.  On  the  other  hand  it  must  capture  process 
oriented  knowledge  that  specifies  the  context,  procedural  methods,  priorities,  scheduling 
information,  etc.,  that  is  needed  to  make  a  system  efficient.  In  this  dissertation,  we  refer  to 
the  process  oriented  or  procedural  component  as  the  operational  component  of  knowledge. 
The  object-oriented  programming  paradigm  which  we  now  discuss  has  the  capability  to 
organize  knowledge  and  to  represent  both  declarative  and  operational  information. 

2.2    Object-Oriented  Programming  Environments 

The  object-oriented  programming  paradigm,  first  introduced  in  SIMULA  67  [DAH68] 
and  later  popularized  by  SmallTalk-80  [GOL83],  has  had  a  considerable  impact  on  modeling 
and  managing  knowledge;  one  of  its  attractions  is  that  it  allows  the  incorporation  of  concepts 
from  other  programming  paradigms  such  as  logic,  functional  programming,  rule-based  pro- 
duction systems,  etc.  It  has  been  proposed  in  KER84b  that  this  feature  can  be  exploited  so 
that  object-oriented  knowledge  representation  models  can  uniformly  handle  the  representa- 
tion of  both  declarative  and  process  oriented  or  operational  information. 

Traditionally,  a  program  was  characterized  as  follows: 

program  =  data  structure  -|-  algorithm 

The  object-oriented  paradigm  emerged  from  the  necessity  to  structure  program 
knowledge  by  encapsulating  data  structures  and  algorithms  into  complex  objects  and  to  sup- 
port the  direct  representation  and  processing  of  these  complex  objects. 

An  object-oriented  model  has  two  main  features: 

(a)  It  separates  the  specification  and  representation  of  objects-this  is  the  information  hid- 
ing aspect. 

(b)  It  controls  the  interface  to  the  objects  through  standardized  access  methods-this  is  the 
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encapsulation  aspect.  These  methods  correspond  to  high  level  operations  that  are 
closer  to  real  world  operations  than  the  lower  level  primitive  data  manipulation  opera- 
tions used  to  implement  them. 

An  important  organizational  feature  of  an  object-oriented  model  is  its  class  hierarchy. 
Objects  are  organized  into  classes;  each  class  can  be  derived  from  a  super-class  and/or  each 
claiss  can  be  further  organized  into  sub-classes.  The  class  hierarchy  is  an  inheritance  hierar- 
chy, and  the  specification  and  representation  can  be  inherited  into  a  class  from  its  super- 
class(es).  Each  object  class  or  object  type  can  be  instantiated  to  have  several  object 
occurrences  or  instances.  Objects  can  also  be  composed  of  other  objects  in  a  component 
hierarchy  which  is  separate  from  the  inheritance  hierarchy. 

KEE  [INT84],  KLONE  [MOS83  and  SCH83],  LOOPS  [STE83],  PRISM  [KER84a  and 
KER84b]  and  STROBE  [LAF84  and  SMI83]  are  examples  of  systems  that  are  object- 
oriented. 

Our  knowledge  representation  model  conforms  with  the  object-oriented  paradigm.  We 
exploit  the  organizational  features,  i.e.,  the  inheritance  and  component  hierarchies  and  the 
encapsulation  feature  of  this  paradigm,  as  discussed  in  Chapter  Three. 

2.3   The  Role  of  Constraints  in  Knowledg-e  Management 

The  use  of  constraints  as  a  unifying  paradigm  in  expert  systems,  DBMS  and  knowledge 
representation  systems  has  been  suggested  in  MOR84.  Constraints  provide  a  convenient  way 
for  expressing  relationships  that  must  hold  between  different  pieces  of  information.  The  role 
of  constraints  as  a  means  of  expressing  knowledge  has  been  recognized  and  exploited  in 
several  ways  [BR078,  CHA84,  FUT84,  HAM75,  HAM76,  HAM80,  KER84b,  MIN83, 
MOR84,  RAS84  and  XU83]. 

In  DBMS,  constraints  can  provide  internal  consistency  of  semantic  data  models,  data 
security  and  data  integrity.   Constraints  can  also  act  as  a  search  heuristic  and  can  be  applied 
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to  queries  in  order  to  generate  semantically  equivalent  but  more  efficient  queries  or  to  ter- 
minate queries  that  violate  the  constraint. 

In  artificial  intelligence  applications  such  as  expert  systems,  constraints  can  specify  rela- 
tionships between  problem  specifications,  goals,  etc.  For  example,  constraints  are  used 
effectively  in  a  truth  maintenance  system  [DOY79J.  In  CHA84  and  RAS84  we  see  the  use  of 
integrity  constraints  in  semantic  query  optimization.  Based  on  the  notion  of  subsumption 
and  partial  subsumption  [CHA84],  integrity  constraints  are  used  to  optimize  queries. 

The  process  of  constraint  management  has  several  components.  The  first  component  is 
involved  with  the  specification  of  the  constraint.  The  second  concerns  the  mechanism  used 
for  checking  if  the  constraint  has  been  violated  and  the  third  component  is  responsible  for 
maintaining  the  consistency  of  the  database  with  the  constraints. 

In  MOR84,  Constraint  Equations  (CEs)  are  developed  in  conjunction  with  the  KL-ONE 
object-oriented  knowledge  representation  system.  The  information  expressed  in  the  CEs  is 
the  declarative  component  and  for  a  subset  of  the  CEs  a  prototype  compiler  automatically 
generates  a  set  of  condition-action  rules  which  maintain  these  constraints.  In  this  system, 
however,  the  CEs  cannot  always  express  in  a  declarative  way  the  complete  operational  infor- 
mation required  to  maintain  consistency.  To  overcome  this  drawback,  operational  informa- 
tion is  implicitly  expressed  in  the  CEs.  A  more  elegant  solution  is  to  explicitly  specify  both 
the  declarative  and  process  oriented  or  operational  semantics  of  the  constraint;  this  is  the 
approach  we  have  used  in  our  specification  of  rules  as  will  be  discussed. 

Constraints  are  also  used  in  the  PRISM  architecture  [KER84a  and  KER84b]  which  is 
an  object-oriented  semantic  net  with  a  natural  clustering  of  rules  at  the  nodes.  Constraints 
are  expressed  in  the  Constraint  Language  (CL)  and  these  constraints  are  used  to  define, 
extend  and  populate  the  net  in  a  consistent  way.  Advantages  of  this  system  are  the  explicit 
specification  of  constraints  and  the  uniform  treatment  of  data  and  metadata  in  the  semantic 
net,  e.g.,  an  instance  or  occurrence  of  an  object  type  is  the  same  as  the  object  type.  The 
user  can  extend  the  system  (including  the  constraints)  and  the  inference  engine  validates 
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system  behavior  by  proving  conjectures.  New  knowledge  is  added  incrementally  to  the  sys- 
tem. Several  aspects  not  addressed  in  this  study  include  the  use  of  deductive  rules  to  derive 
new  information,  the  support  of  recursion  and  examples  of  the  use  of  inference  chains  caused 
by  chaining  constraints  together.  Also,  there  is  no  mention  of  how  the  rules  are  stored  or 
manipulated  in  the  object-oriented  system.  These  are  issues  that  will  be  explored  in  this 
dissertation. 

2.4    Other  Techniques  for  Integrating  DBMS  and  Al  Technologies 

Several  approaches  for  merging  AI  and  DBMS  technologies  have  been  suggested  in 
BR084,  GAL83,  JAR84b,  VAS85  and  WHA87.  One  approach  is  to  build  an  interface  or  a 
bridge  between  a  database  processor  that  manages  a  database  of  facts  or  assertions  (the 
extensional  database)  and  an  inference  processor  that  manipulates  deductive  rules  (the  inten- 
sional  database).   This  approach  is  used  in  JAR84a,  KEL82  and  KEL84. 

The  disadvantages  of  this  separation  stem  from  the  fact  that  there  are  different 
representations  for  facts  and  rules  and  the  two  systems  are  functionally  independent.  The 
specification  of  rules  is  independent  of  the  database  retrieval  capability  provided  by  the  data- 
base processor.  As  a  result,  the  expressive  power  of  the  rules  is  limited  by  the  rule 
specification  language;  we  cannot  bring  to  bear  the  full  capabilities  of  the  database  manage- 
ment system  to  express  the  rules.  The  second  drawback  relates  to  efficiency  considerations. 
An  inference  plan  is  created  using  the  intensional  database  and  is  verified  using  the  exten- 
sional database.  Efficiency  can  be  improved  by  verifying  the  inference  plan  at  intermediate 
stages  and  providing  feedback.  Such  verification,  however,  is  cumbersome  because  of  the 
separation  of  the  rule  base  and  fact  base. 

Another  related  problem,  mentioned  in  KEL84,  is  that  database  operations,  such  as 
aggregations,  cannot  be  executed  over  deduced  concepts  not  found  in  the  fact  base.  As  a 
result,   functions   have   to   be   duplicated   in   the   two  systems.     Lastly,   forward   reasoning 
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requires  intimate  interaction  between  the  fact  base  and  the  rule  base  since  deductions  are 
made  outward  from  the  facts  in  the  absence  of  a  specific  goal.  This  is  difficult  if  the  two 
databases  are  on  separate  systems. 

Another  approach  to  implementing  a  KBMS  is  to  either  enhance  a  DBMS  with  deduc- 
tive power  so  that  the  search  portion  of  a  reasoning  system  can  be  moved  into  the  DBMS 
[ST083,  ST084  and  WON84]  or  to  enhance  a  logic  programming  system  with  database  facil- 
ities [WAR84J.  Although  these  approaches  may  achieve  some  of  the  necessary  functionalities 
of  a  KBMS,  it  is  difficult  to  extend  an  existing  system  to  handle  requirements  outside  the  ori- 
ginal system  specifications. 

Our  approach  to  the  design  of  a  KBMS,  which  we  outline  in  the  next  chapter,  is  closest 
to  the  meaning  of  integration.  It  has  neither  the  disadvantages  of  the  first  "interface" 
approach  nor  the  limitations  of  the  second  "extensions"  approach. 

The  research  topics  to  be  reviewed  in  later  chapters  are  as  follows:  In  Chapter  Five  we 
briefly  discuss  semantic  data  models  and  semantic  features  that  are  useful  in  modeling 
knowledge.  In  Chapter  Seven  we  review  available  techniques  for  implementing  rules  in  the 
0PS5  production  system  as  well  as  in  an  extended  version  of  the  INGRES  DBMS.  We  also 
briefly  discuss  the  serializability  criterion  of  correctness  for  concurrently  executed  database 
transactions.  In  Chapter  Eight  we  discuss  techniques  for  evaluating  linear  recursive  queries 
as  well  as  database  query  optimization  techniques. 
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CHAPTER  m 
ARCHITECTURE  OF  THE  INTEGRATED  OBJECT-ORIENTED  KBMS 

In  this  chapter,  we  outline  the  architecture  of  our  integrated  object-oriented  KBMS. 
We  first  list  the  features  of  the  KBMS  and  discuss  why  the  design  of  the  KBMS  is  integrated 
and  object-oriented.  We  then  outline  the  approach  that  is  used  in  our  design.  Finally,  we 
explain  the  significance  of  the  material  to  be  presented  in  Chapters  Four  through  Eight  to 
the  design  of  the  KBMS. 

The  features  of  an  integrated  object-oriented  KBMS  are  as  follows: 

(1)  A  powerful  object-oriented  knowledge  representation  model.  The  model  is  capable  of 
defining  the  structure  of  the  object  types  to  be  stored  in  the  knowledge  base.  It  also 
supports  high  level  operations  by  which  the  object  types  will  be  accessed  and  manipu- 
lated and  rules  that  capture  problem  solving  knowledge. 

(2)  A  powerful  knowledge  manipulation  language  (KML)  to  be  used  in  conjunction  with  the 
model.  The  KML  is  used  to  specify  operations  for  accessing  and  manipulating  the 
object  types  in  a  standardized  manner  and  to  specify  the  semantic  information  captured 
by  the  rules. 

(3)  A  mechanism  for  applying  the  problem  solving  knowledge  that  is  captured  in  the  rules 
defined  for  the  object  types,  when  processing  a  transaction  in  the  KBMS. 

(4)  The  implementation  (of  the  KBMS)  fosters  the  functional  integration  of  the  DBMS  and 
the  AI  reasoning  system  components  of  the  KBMS  that  process  the  knowledge  base. 
An  object-oriented  model  defines  a  collection  of  object  types  and  specifies  the  structure 

of  the  facts  to  be  stored  with  each  object  type,  the  structural  relationships  between  object 
types  and  the  operations  that  access  and  manipulate  each  object  type.  An  object-oriented 
knowledge    representation    model    must    extend    this    definition    to    include    a    knowledge 
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component  to  capture  problem  solving  knowledge  relevant  to  each  object  type.  Rules  have 
been  used  widely  and  effectively  as  the  knowledge  component  in  AI  reasoning  systems.  We 
propose  that  rules  be  used  in  our  model,  too,  as  the  knowledge  component  of  each  object 
type. 

Unfortunately,  there  is  a  serious  shortcoming  in  most  rule  based  production  systems  or 
clausal  schemes  such  as  Prolog;  they  do  not  structure  the  rules  or  clauses.  This  results  in 
inefficient  inference  mechanisms.  For  example,  in  FOR82,  it  is  estimated  that  a  production 
system  such  as  OPS5  spends  90%  of  its  effort  in  selecting  appropriate  rules.  However,  both 
the  inheritance  and  component  hierarchies  of  an  object-oriented  model  can  be  used  to  con- 
struct a  framework  in  which  to  organize  and  provide  structure  to  rules.  As  a  result,  rules 
can  be  used  very  effectively  as  the  knowledge  component  in  our  object-oriented  model. 

We  extend  the  encapsulation  feature  that  allows  high  level  operations  to  be  defined  for 
each  object  type  and  we  allow  the  attachment  of  rules  to  the  definition  of  the  object  types, 
as  well.  Declarative,  operational  (or  procedural)  and  heuristic  knowledge  is  incorporated  into 
our  model,  via  rules.  For  example,  rules  are  used  to  describe  declarative  knowledge  about 
relationships  that  exist  between  object  types.  Rules  also  encapsulate  operational  information 
and  heuristic  knowledge  into  abstract  entities. 

By  using  the  object-oriented  paradigm,  we  are  able  to  structure  and  organize 
knowledge.  We  also  integrate  the  fact  base  and  rule  base  within  the  object  types  of  a  single 
knowledge  base.  Clustering  operations  and  rules  by  object  type  defines  the  context  for 
applying  these  operations  and  rules.  This  clustering  also  provides  a  binding  between  facts 
and  relevant  rules  within  the  object  type;  this  binding  is  an  important  property  of  a 
knowledge  representation  model. 

The  integrated  knowledge  base  is  defined  by  a  collection  of  object  types.    Each  object 

type  is  defined  by  its  structure,  operations  and  rules.   Each  object  type  can  be  instantiated  to 

have  many  instances  or  occurrences.    Each  occurrence  of  an  object  type  will  acquire  the 

^  definition  of  the  object  type;  i.e.,  it  will  have  the  same  structure,  operations  and  rules  that 
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define  the  corresponding  object  type.   The  object  type  and  its  occurrences  are  all  knowledge 

base  objects.  . 

Each  object  (type  or  occurrence)  is  unique  and  identifiable  in  the  knowledge  base;  to  do 
so  each  object  is  associated  with  an  object  identifier  (OID)  which  is  system  generated.  Rules 
are  themselves  objects.  Rules  defined  for  an  object  type  apply  to  all  occurrences  of  the 
object  type. 

To  provide  this  object-oriented  framework  for  knowledge  representation,  we  use  a  tech-  '     ''■ 

nique  of  incorporating  rules  into  a  semantic  data  model  [RAS85].  The  semantic  association 
model  SAM*  [SU83  and  SU85),  was  designed  as  a  semantic  data  model  for  engineering  and 
scientific/statistical  databases.  A  DBMS  implementation  of  SAM*  is  currently  underway  at 
the  University  of  Florida  Center  for  Database  Systems  Research  and  Development.  V 

SAM*  derives  its  powerful  representation  capability  from  the  variety  of  modeling  con-  ;   . 

structs  that  it  supports.  SAM*  recognizes  basic  data  types  such  as  integer,  complex  data 
types  such  as  set,  and  abstract  data  types  such  as  COMPUTE  (to  encapsulate  a  sequence  of 
executable  operations)  and  RULE  (corresponding  to  a  rule).  SAM*  identifies  several  seman- 
tic properties  found  useful  in  modeling  engineering  databases  and  then  defines  seven  associa- 
tion types  to  model  these  semantic  properties.  '  .  .v-> 

Each  of  the  association  types  in  the  SAM*  model  is  defined   by  its  structure  (to 
represent  the  facts  which  it  stores)  and  operations  (to  specify  how  the  stored  facts  are  mani-  -2 

pulated).  To  accommodate  our  knowledge  representation  model,  we  add  a  knowledge  com- 
ponent; each  of  the  association  types  is  now  defined  by  its  rules,  as  well.  i^^ 

SAM*    association    types   are    the   system    object   types   inherent   to   the    knowledge  '■  ■   ^'M 

representation  model  and  are  used  as  building  blocks.  When  a  knowledge  base  is  being 
designed  for  a  particular  application  domain,  the  user  object  types  specific  to  that  domain  '1^ 

will  be  built  using  the  system  object  types  (SAM*  association  types).  ..  '.''    ■'• 

A  user  object  type  is  modeled  using  either  a  single  system  object  type  or  a  network  of 
system  object  types.    The  user  object  type  is  a  sub-class  of  the  system  object  type(s)  and 
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inherits  the  structure,  operations  and  rules  defined  for  the  system  object  type(s).  In  order  to 
capture  the  semantics  and  problem  solving  knowledge  specific  to  an  application  domain,  the 
designer  will  extend  the  inherited  definition  of  a  user  object  type  by  defining  additional 
operations  and  rules  for  it. 

Each  of  the  user  object  types  is  an  object  and  will  have  an  OID  value.  Similarly,  each 
of  the  user  object  types  is  instantiated  and  has  several  occurrences;  each  of  these  is  an  object 
with  an  OID  value.  Object  occurrences  are  instantiated  with  the  same  structure,  operations 
and  rules  that  are  defined  for  the  corresponding  user  object  type. 

Just  as  problem  solving  knowledge  can  be  specific  to  a  particular  application  domain  or 
user  object  type,  there  can  be  knowledge  that  is  specific  to  a  particular  occurrence  of  a  user 
object  type.  To  accommodate  this,  the  user  object  type  corresponding  to  this  object 
occurrence  is  defined  by  an  attribute  whose  data  type  is  RULE;  the  value  of  this  attribute 
for  any  occurrence  of  the  user  object  type  is  a  rule  that  is  specific  to  that  particular 
occurrence. 

Each  chapter  of  this  dissertation  deals  with  some  aspect  of  the  design  and  implementa- 
tion of  the  integrated,  object-oriented  architecture  for  a  KBMS  which  is  outlined  above. 

Rules  that  are  defined  for  the  object  types  and  occurrences  of  the  knowledge  base  are 
an  important  component  of  our  model.  Thus,  the  focus  of  Chapter  Four  is  a  knowledge 
manipulation  language  (KML)  for  expressing  these  rules.  In  that  chapter  we  introduce  some 
KML  constructs  for  manipulating  the  objects.  The  rules  are  expressed  using  these  KML  con- 
structs. The  KML  allows  the  rules  to  express  (a)  declarative  semantics  corresponding  to 
relationships  between  knowledge  base  object  types  and  occurrences  which  must  be  main- 
tained and  (b)  operational  semantics  corresponding  to  strategies  to  maintain  these  relation- 
ships. The  KML  constructs  can  also  be  used  to  express  (c)  access  methods  or  high  level 
operations  that  are  a  part  of  the  object  type  definition.  The  use  of  a  single  KML  to  express 
the  rules  and  to  define  operations  that  manipulate  objects  is  an  important  feature  and  its 
consequences  will  be  discussed  later.  'l 
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Chapter  Five  provides  examples  of  user  object  types  in  an  application  domain  modeled 
using  the  system  object  types  (SAM*  association  types)  and  their  rules.  Although  by  no 
means  exhaustive,  these  examples  illustrate  the  technique  of  incorporating  rules  into  the 
semantic  data  model  to  obtain  our  knowledge  representation  model.  Semantic  features  use- 
ful in  modeling  knowledge  from  diverse  domains  are  modeled  by  the  user  object  types  of  the 
example  knowledge  base.  The  user  object  types  are  built  using  the  system  object  types  and 
extended  by  the  domain  specific  rules  defined  for  the  user  object  types. 

The  third  feature  of  our  KBMS  is  a  mechanism  for  applying  the  knowledge  component 
during  transaction  processing  and  Chapter  Six  describes  this  mechanism.  A  KBMS  transac- 
tion is  a  sequence  of  operations  which  are  executed  against  the  object  types  and  occurrences 
of  the  integrated  knowledge  base.  The  rules  defined  for  these  object  types  must  be  applied 
while  processing  the  transaction.  This  mechanism  for  applying  rules  comprises  a  match- 
modify-execute  (MME)  cycle.  The  mechanism  exploits  the  binding  between  facts  and 
relevant  rules  within  the  object  types.  It  also  exploits  the  fact  that  the  rules  are  expressed 
using  the  same  KML  constructs  as  are  the  operations  of  the  transaction. 

The  final  feature  of  our  design  is  that  the  implementation  fosters  the  functional  integra- 
tion of  the  DBMS  component  that  processes  operations  and  the  AI  reasoning  component  that 
processes  rules.  Chapters  Seven  and  Eight  focus  on  implementation  to  support  functional 
integration. 

Integration  at  the  representation  level  of  the  model  is  based  on  the  following  features 
(previously  described): 

(a)  structuring  rules  using  the  inheritance  and  component  hierarchies  of  our  object-oriented 
model 

(b)  binding  facts  and  relevant  rules  within  the  object  types  of  the  integrated  knowledge 
base  using  the  encapsulation  feature 

(c)  using  a  single  KML  to  both  define  operations  that  manipulate  object  types  and  to 
express  rules. 


19 


These  features  lead  to  functional  integration;  the  DBMS  and  AI  reasoning  components 
can  be  characterized  using  common  functions.  In  our  KBMS,  the  AI  reasoning  component  is 
characterized  using  DBMS  retrieval  and  storage  manipulation  functions.  Functional  integra- 
tion implies  that  either  AI  or  DBMS  techniques  can  be  applied  to  the  functionally  integrated 
KBMS. 

Chapter  Seven  studies  methods  to  (a)  organize  the  object  types  and  occurrences  and 
their  rules  in  the  knowledge  base  and  (b)  determine  the  context  of  a  rule  (the  relevant  object 
types  and  occurrences),  so  as  to  support  the  mechanism  for  applying  rules.  Chapter  Seven 
also  studies  the  increased  efficiency  resulting  from  the  interleaved  execution  of  concurrent 
transactions  in  the  KBMS.  Chapter  Eight  examines  ways  to  further  benefit  from  functional 
integration  through  the  use  of  DBMS  query  optimization  techniques  while  evaluating  a 
KBMS  transaction.  The  approach  taken  is  to  treat  the  set  of  resolvents  generated  by  a 
linear  recursive  query  as  a  set  of  concurrent  KBMS  retrievals  and  then  to  apply  DBMS  query 
optimization  techniques  to  efficiently  evaluate  these  retrievals  in  a  KBMS  transaction. 


CHAPTER  IV 
EXPRESSING  RULES  IN  A  KNOWLEDGE  MANIPULATION  LANGUAGE 

In  this  chapter,  we  first  discuss  desirable  semantics  to  be  captured  in  the  knowledge 
component  defined  for  the  object  types.  We  describe  different  categories  of  language  con- 
structs of  a  knowledge  manipulation  language  (KML)  and  different  categories  of  rules 
(corresponding  to  these  language  constructs).  We  show  how  rules  expressed  in  a  KML  can 
capture  these  desirable  semantics.  Finally,  we  discuss  the  advantages  of  classifying  the  rules 
and  how  these  rules  can  support  both  forward  and  backward  chains  of  inference. 

4.1    The  Semantics  Captured  in  Riilps 

In  Chapter  Two  we  reviewed  the  dichotomy  between  declarative  and  operational  (pro- 
cedural) semantics  from  the  viewpoint  of  knowledge  representation  schemes.  A  similar  dis- 
tinction can  also  apply  to  the  rules  that  are  a  component  of  our  knowledge  representation 
model.  These  rules  must  be  able  to  express  diverse  semantics.  This  includes  integrity  con- 
straints, deductive  rules  that  generate  new  information,  expert  rules  that  capture  problem 
solving  knowledge  pecuhar  to  an  application  domain,  rules  that  support  attribute  inheri- 
tance, etc.  ■    '  " 

Declarative  semantics  correspond  to  a  well  formed  formula  describing  relationships  that 
must  hold  between  object  occurrences  in  the  knowledge  base  and  are  the  specification  com- 
ponent of  semantics.  In  contrast,  the  information  needed  to  check  and  maintain  these  rela- 
tionships are  operational  semantics.  This  operational  component  deals  with  the  strategy  and 
tactics  needed  to  achieve  results  efficiently  [MOR84];  this  includes  specifying  the  context, 
procedural   methods,    priorities,   scheduling   information,   etc.     Thus,   to  be   adequate,   the 
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knowledge   component  of  an   integrated  KBMS  must  explicitly  specify  both  declarative 
semantics  as  well  as  operational  semantics  [BR084,  KER84b  and  MYL81]. 

In  KER84b,  it  is  suggested  that  complete  constraint  formalisms  must  provide  informa- 
tion along  the  lines  of  WHAT,  WHEN,  WHERE  and  HOW.  WHAT  specifies  the  conditions 
that  must  be  satisfied  by  the  knowledge  base  and  corresponds  closely  to  declarative  seman- 
tics. The  WHEN  and  WHERE  components  specify  when  a  constraint  is  to  be  examined  and 
in  what  context,  and  is  related  to  the  operational  semantics  of  checking  a  constraint.  The 
HOW  component  specifies  what  actions  must  be  taken  to  maintain  the  relationship  and 
requires  information  from  both  components.  The  same  analogy  can  be  extended  to  rules  in 
general. 

To  illustrate  the  importance  of  explicitly  specifying  both  declarative  as  well  as  opera- 
tional semantics  using  rules,  we  use  the  example  of  a  P*S_P  object  type  with  two  attributes 
PART  and  SUB_PARTS.  The  object  type  models  the  relationship  between  a  part  and  its 
set  of  sub-parts.  It  is  described  in  detail  in  Chapter  Five  and  Figure  5.1  models  this  object 
type. 

P*S_P  is  subject  to  a  uniqueness  constraint  based  on  the  attribute,  PART,  i.e.,  each 
occurrence  of  P*S_P  should  have  a  distinct  value  for  the  attribute  PART.  This  constraint 
can  be  expressed  declaratively  in  clausal  first  order  predicate  logic,  using  extensional  predi- 
cates corresponding  to  the  object  occurrences,  e.g.,  P*S_P(x,y),  and  evaluable  predicates  for 
functions  evaluated  against  these  occurrences,  e.g.,  EQUAL(x,y)  as  described  in  CHA84  and 
ULL85.   The  uniqueness  constraint,  expressed  in  this  form,  is  as  follows: 

EQUAL(y,z)  <~  P*S_P(x,y),  P*S_P(p,z),  EQUAL(x,p) 

This  well  formed  formulae  only  expresses  the  WHAT  information  or  the  specification 
component.  The  operational  information  that  must  be  specified  includes  the  WHEN  and 
WHERE  component,  e.g.,  this  uniqueness  constraint  must  be  checked  before  an  INSERT 
operation  executes  against  the  P*S_P  object  type.   The  HOW  component  may  be  that  if  the 
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constraint  is  violated  (when  p  equals  x  but  y  and  z  are  unequal),  then  the  strategy  to  main- 
tain the  constraint  is  to  combine  P*S_P(x,y)  and  P*SJP(p,z)  into  a  single  occurrence 
P*S_P(x,q),  where  q  is  the  union  of  the  sets  y  and  z.  All  the  information,  related  to  these 
different  components,  must  be  made  available  in  the  form  of  rules  and  it  could  require 
several  rules  to  completely  specify  this  information. 

4.2   Rules  Expressed  in  a  KMT. 

In  this  section,  we  show  that  using  a  single  knowledge  manipulation  language  (KA'IL)  to 
specify  operations  that  manipulate  objects  and  to  express  rules  provides  a  powerful  rule 
language  that  can  explicitly  specify  both  declarative  and  operational  semantics. 

The  KML  constructs  are  an  extension  of  conventional  data  manipulation  language 
(DML)  constructs.  The  KML  includes  both  set-theoretic  and  algebraic  operations  that  can 
be  executed  against  the  object  occurrences  of  the  knowledge  base,  either  for  retrieval  or  for 
storage  manipulation.  Two  new  operations,  namely  the  EXECUTE  operation  and  the 
DERIVE  operation  are  introduced  and  described  in  this  chapter.  The  KML  supports  a 
variety  of  high  level  programming  constructs  such  as  FOR,  WHILE  and  REPEAT  loops,  IF- 
THEN-ELSE  or  CASE  statements,  etc.  It  also  supports  the  use  of  evaluable  functions,  e.g., 
GREATER-THAN,  EQUAL,  etc.,  and  set-oriented  functions,  e.g.,  MEMBER_OF, 
SUBSET_OF,  etc.,  that  return  a  truth  value  after  retrieving  one  or  more  object  occurrences 
from  the  knowledge  base. 

While  the  specification  and  design  of  a  KML  is  important,  it  deals  with  issues  in 
language  design  and  is  beyond  the  scope  of  our  research.  We  are  concerned  with  using  the 
KML  to  specify  rules,  and  so,  we  limit  ourselves  to  the  functionality  of  a  language  construct 
rather  than  the  specific  construct. 

The  constructs  of  the  KML  are  classified  into  different  categories  corresponding  to  the 
functionahty  supported  by  each  construct,  as  follows: 
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(1)  Constructs  which  describe  a  relationship  between  object  occurrences  in  the  knowledge 
base. 

(2)  Constructs  which  represent  the  execution  of  KML  operations. 

(3)  Constructs  which  involve  communication  between  the  KBMS  and  the  user  or  an  appli- 
cation program. 

(4)  Constructs  which  correspond  to  the  execution  of  another  rule. 
The  rules,  expressed  using  KML  constructs,  have  the  structure 

<IF  condition  THEN  consequent  action  >  or  (LHS,  RHS)  as  often  used  in  production  sys- 
tems [DAV75  and  NEW73].  The  conditional  left  hand  side  (LHS)  must  be  satisfied  before 
the  right  hand  side  (RHS)  consequent  can  be  applied.  There  are  several  differences  between 
these  rules  expressed  in  the  KML  and  productions  of  a  production  system.  These  differences 
are  with  respect  to  the  interactions  between  the  rules,  as  discussed  in  this  chapter,  and  the 
mechanism  for  selecting  rules,  as  discussed  in  Chapter  Six. 

Both  the  conditional  LHS  and  the  consequent  RHS  of  the  rules  are  expressed  using  the 
KML  constructs  in  any  of  the  above  categories.  Each  category  of  KML  constructs  supports 
a  different  functionality;  consequently,  rules  expressed  using  these  constructs  capture 
different  semantics.  Thus,  based  on  the  category  of  the  KML  constructs  on  either  the  LHS 
or  the  RHS,  we  classify  rules  into  different  categories. 

For  example,  depending  on  the  category  of  the  LHS  conditional  construct,  rules  are 
categorized  as  follows: 

(a)  value  dependent  rules  and 

(b)  value  independent  rules. 

Value  dependent  rules  capture  declarative  semantics  describing  a  relationship  between 
knowledge  base  object  occurrences.  On  the  other  hand,  value  independent  rules  capture 
operational  semantics.  These  rules  test  the  execution  status  of  KML  operations  or  other 
rules  and  capture  triggering  or  scheduling  information. 
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Based  on  the  category  of  the  RHS  consequents,  rules  are  categorized  as  follows: 

(a)  rules  which  derive  new  information  (or  deductive  rules), 

(b)  rules  which  execute  actions  and 

(c)  rules  which  build  explicit  inference  chains  by  executing  other  rules. 

Several  rules  from  different  categories  are  used  together  to  capture  complex  problem 
solving  information  for  a  particular  domain  as  will  be  seen  in  the  next  chapter.  In  the  rest  of 
this  chapter  we  describe  the  different  categories  of  rules,  in  detail,  and  discuss  some  advan- 
tages of  classifying  them. 

4.2.1    VrIup  Dependent  Rules 

A  rule  whose  LHS  is  a  KML  construct  that  expresses  a  condition  or  relationship 
between  object  occurrences  is  classified  as  a  value  dependent  rule.  Its  LHS  corresponds  to  a 
declarative  formula  which  may  be  expressed  in  clausal  form  by  extensional  predicates  and 
evaluable  predicates  [CHA84  and  ULL85],  or  equivalently,  in  a  tuple  calculus  form  as  in  the 
QUEL  or  SQL  data  manipulation  languages. 

This  declarative   formula  represents  a  condition   that   must   be   verified   against   the 
knowledge  base  and  is  equivalent  to  a  retrieval  statement  describing  the  object  occurrences 
that  must  be  retrieved  to  satisfy  the  condition.    Thus,  we  may  use  the  complete  retrieval 
capability  of  the  KML  to  describe  this  condition.  A  value  dependent  rule  is  as  follows: 
IF  (condition  or  relationship  between  object  occurrences) 
THEN  (RHS  consequent) 

The  LHS  of  such  a  rule  is  true  after  retrieval  of  the  required  object  occurrences,  or 
verification  of  the  described  condition.  When  a  condition  is  verified  for  a  truth  value,  then 
the  actual  values  of  the  object  occurrences  that  satisfy  the  condition  may  not  be  required  for 
further  use.    Several  examples  of  this  category  of  rules  are  seen  in  the  next  chapter.   A  value 
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dependent  rule  which  specifies  a  relationship  between  occurrences  of  P*S_P  is  as  follows: 
IF  (there  exist  occurrences  X  and  Y  of  P*S_P,  respectively,  such  that 
EQUAL(X.PART,  Y.PART) 
THEN  (RHS  consequent) 
We  classify  this  rule  as  a  value  dependent  rule  because  satisfying  the  LHS  depends  on 
specific  values  of  object  occurrences  stored  in  the  knowledge  base.   The  semantics  captured 
by  such  a  rule  are  declarative  (or  descriptive)  semantics.    These  rules  must  be  executed 
against  the  knowledge  base,  as  will  be  discussed. 

The  condition  being  verified  or  the  values  retrieved  on  the  LHS  of  a  value  dependent 
rule  could  also  involve  KML  constructs  that  support  communication  between  the  KBMS  and 
the  external  world.  The  LHS  condition  could  test  the  values  of  messages  to  and  from  an 
operator,  interrupts  or  control  commands,  error  messages  or  abort  flags,  etc.,  and  these  rules 
can  be  used  to  synchronize  the  KBMS  with  other  programs. 

4.2.2    Value  TndpppnHent  Riilps 

A  rule  whose  LHS  tests  a  KML  operation  (actually  its  execution  status,  as  will  be  dis- 
cussed) or  the  execution  status  of  another  rule  is  classified  as  a  value  JnHpppnHpnf,  rule.  The 
KML  operations  being  tested  may  include  retrieval  operations  and  storage  operations 
(INSERT,  DELETE,  etc.,)  as  well  as  user  defined  operations  associated  with  an  object  type. 
Examples  of  these  rules  are  also  seen  in  Chapter  Five. 

A  rule  which  tests  the  execution  status  of  a  KML  operation  is  as  follows: 
IF  option  (KML  OPERATION  against  OBJECT  TYPE) 
THEN  (RHS  consequent) 

The  KML  operation  that  is  tested  by  this  rule  is  called  a  triggering  operation  since  it 
triggers  or  causes  the  execution  of  the  RHS  consequent  of  this  rule.  These  rules,  also  called 
triggers,  are  value  independent  since  they  are  independent  of  the  values  of  object  occurrences 
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stored  in  the  knowledge  base  and  their  LHS  is  satisfied  by  the  KML  operations  themselves. 
These  triggers  modify  a  transaction  which  contains  the  KML  operation  by  scheduling  the 
execution  of  the  trigger's  RHS  consequent,  within  the  transaction.  Triggers  capture  opera- 
tional semantics,  i.e.,  they  specify  triggering  and  scheduling  information. 

Production  systems  generally  support  rules  that  resemble  our  value  dependent  rules. 
The  support  of  triggers  that  capture  operational  semantics,  and  are  used  to  modify  a  tran- 
saction, is  a  major  difference  from  the  production  system  approach.  Triggers  are  introduced 
m  conjunction  with  the  concept  of  a  transaction  and  thus,  have  a  major  impact  on  the  pro- 
cess of  selecting  rules,  as  will  be  seen  in  Chapter  Six. 

The  execution  status  of  a  KML  operation  identifies  if  the  operation  is  waiting  to  be  exe- 
cuted, is  being  executed  or  has  completed  execution.  There  are  three  options  that  can  be 
specified  in  the  trigger.  The  options  specify  the  semantics  that  control  the  scheduling  of  the 
triggering  operations  and  the  RHS  consequent  of  the  trigger,  within  a  transaction. 

In  Figure  4.1,  we  describe  the  modifications  to  a  transaction,  when  each  of  these 
options  is  specified  in  a  trigger.  The  first  option  is  the  pre-execution  option  which  is  satisfied 
when  the  triggering  operation  is  waiting  to  be  executed.  When  this  option  is  used,  then  the 
RHS  consequent  of  the  trigger  is  scheduled  to  execute  before  the  triggering  operation.  The 
next  option  is  the  post-execution  option  which  will  be  satisfied  after  the  triggering  operation 
has  completed  execution.  With  the  post-execution  option,  the  RHS  consequent  of  the  trigger 
is  executed  just  after  the  triggering  operation.  The  third  option  is  the  parallel-execution 
option  which  will  be  satisfied  during  the  execution  of  the  triggering  operation.  With  this 
option  the  RHS  consequent  of  the  trigger  and  the  triggering  operation  can  be  executed  in 
parallel.  The  keywords  pre-exec,  post-exec  and  par-exec  will  be  used  to  specify  these  options. 
Each  of  these  options  have  their  specific  semantics.  For  example,  a  retrieval  operation 
could  trigger  a  security  constraint  which  may  find  the  retrieval  to  violate  the  constraint  and 
may  abort  the  retrieval.  This  is  possible  only  if  the  security  constraint  is  executed  before  the 
operation.     To   support   this,    we    use   the   pre-exec   option   with    the   triggering   retrieval   .. 
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operation  on  the  LHS  of  the  trigger.  The  security  constraint  will  be  the  RHS  consequent  of 
this  trigger  and  it  will  be  scheduled  to  precede  the  retrieval  operation.  The  triggering  opera- 
tion will  execute  only  if  the  RHS  consequent  does  not  set  a  flag  to  abort  the  operation. 
Thus,  with  the  pre-exec  option,  the  outcome  of  the  RHS  consequent  must  be  tested  (for  an 
abort  flag)  before  the  triggering  operation  executes.  .  ■ 

A  constraint  which  maintains  an  existence  dependency  following  a  triggering  deletion 
operation  may  require  that  the  constraint  be  executed  after  the  deletion.  Here  the  post-exec 
option  will  be  used  on  the  LHS  of  the  trigger  and  the  constraint  will  be  the  RHS  consequent. 
The  RHS  consequent  will  execute  only  after  the  triggering  operation  completes  execution. 

Alternately,  a  retrieval  operation  may  trigger  a  deductive  rule  that  derives  new 
occurrences  that  may  satisfy  the  retrieval.  In  this  case,  the  triggering  retrieval  operation  and 
the  deductive  rule  can  execute  in  parallel.  However,  the  triggering  operation  must  continue 
execution  as  long  as  the  deductive  rule  is  deriving  new  occurrences.  The  par-exec  option  will 
be  used  on  the  LHS  of  the  trigger  to  capture  these  semantics  with  the  deductive  rule  as  the 
RHS  consequent.  The  triggering  operation  will  not  complete  its  execution  until  the  RHS 
consequent  completes  execution.  A  discussion  on  parallel  execution  will  be  postponed  to  later 
chapters. 

A  value  independent  rule  or  trigger,  rl,    can  also  test  the  execution  status  of  another 
rule,  r2,  defined  for  an  object  type,  as  follows: 
trigger  rl:    IF  option  (EXECUTE  rule  r2) 
THEN  (RHS  consequent) 

Trigger  rl  is  a  value  independent  rule  since  it  does  not  directly  test  values  of  object 
occurrences,  in  the  knowledge  base.  The  rule  r2  whose  execution  status  is  being  tested  by 
trigger  rl  is  the  triggering  rule  and  it  triggers  the  execution  of  the  RHS  consequent  of  rl. 
Triggers  such  as  rl  modify  transactions  containing  the  triggering  rules.  The  triggering  rules 
that  are  in  the  transaction  are  always  value  dependent  rules. 
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The  trigger  rl  will  modify  a  transaction  based  on  the  option  used  on  its  LHS.  Figure 
4.2  describes  the  possible  modifications  to  a  triggering  rule,  in  a  transaction.  There  are  four 
different  options  involved.  In  addition  to  the  pre-exec,  post-exec  and  par-exec  options  dis- 
cussed earlier  with  the  KML  operations,  there  is  a  fourth  option,  namely  "successful"  execu- 
tion.   For  example, 

trigger  rl:    IF  succ-exec  (EXECUTE  rule  r2) 
THEN  (RHS  consequent) 
In  this  situation,  "successful"  execution  (succ-exec)  guarantees  that  the  LHS  of  rule  r2,  whose 
execution  status  is  being  tested,  evaluates  to  a  truth  value  and  the  RHS  of  rule  r2  is  actually 
executed  before  the  RHS  of  rl  will  be  scheduled  for  execution. 

The  value  independent  rules  that  test  the  execution  status  of  other  rules  actually  cap- 
ture meta-information;  i.e.,  they  reason  about  the  execution  of  other  rules.  This  information 
can  be  used  while  structuring  rules  within  the  object  types,  as  discussed  in  Chapter  Seven. 

4.2.3   Rules  that  Derive  Information 

Rules  in  the  following  categories  are  classified  on  the  basis  of  their  RHS  constructs.  A 
construct  describing  a  condition  or  relationship  between  object  occurrences  can  occur  on  the 
RHS  of  a  rule.  This  means  that  some  new  information  corresponding  to  the  RHS  consequent 
is  derived  by  applying  the  rule  and  rules  such  as  these  are  called  deductive  rules.  We  use  the 
DERIVE  operation  of  the  KML  to  describe  this  new  information,  as  follows: 
IF  (LHS  condition) 

THEN  (DERIVE  {description  of  derived  object  occurrences}) 

A  rule  such  as  this  also  captures  declarative  (descriptive)  semantics.  The  syntax  used 
to  describe  the  derived  occurrences  of  a  DERIVE  operation  is  similar  to  an  INSERT  opera- 
tion. However,  the  difference  between  them  is  with  respect  to  the  temporary  or  permanent 
nature  of  the  data.   Data  that  are  inserted  are  permanent  but  the  existence  of  derived  data  is 
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dependent  on  pre-conditions,  specified  in  the  LHS,  which  may  not  always  hold.  Methods  for 
handling  derived  data  have  been  investigated  in  NIC78.  For  our  purposes,  we  will  assume 
that  the  derived  occurrences  will  be  differentiated  from  the  (permanently)  inserted 
occurrences  of  an  object  type;  i.e.,  derived  occurrences  will  belong  to  a  temporary  object 
type.   Examples  of  these  rules  are  seen  in  Chapter  Five. 

4.2.4    Rules  that  F-venite  Actions 

Rules  in  this  category  are  those  rules  whose  RHS  construct  corresponds  to  executable 
actions.  The  action  is  a  KML  operation  and  the  rule  is  as  follows: 
IF  (LHS  condition) 

THEN  (KML  OPERATION  against  OBJECT  TYPE) 

The  semantics  clearly  indicate  that  the  KML  operation  is  a  consequent  action  and  must 
be  executed  by  the  KBMS.  A  rule  such  as  this  captures  operational  semantics  or  the  WHAT 
information  mentioned  earlier.  Note  that  we  previously  discussed  the  possibility  that  an 
operation  could  be  aborted  (when  a  constraint  is  violated).  Thus,  before  executing  this  KML 
operation,  the  corresponding  abort  flag  for  the  operation  must  be  tested.  When  execution 
completes,  its  status  will  reflect  completion. 

The  RHS  action  of  rules  in  this  category  could  also  correspond  to  constructs  that  facili- 
tate communication  between  the  KBMS  and  the  external  world.  This  action  could  involve 
messages,  flags,  etc.  When  this  construct  appears  on  the  RHS,  it  is  an  executable  action 
which  allows  the  KBMS  to  synchronize  with  the  external  world. 

4.2.5    Rules  that  RiiilH  Ryplirit  TnfprPnce  Chains;  ■:'':'.;. 

Rules  that  build  explicit  inference  chains  are  those  rules  which  explicitly  execute  other 
rules.  KML  constructs  corresponding  to  the  execution  of  a  rule  can  occur  on  the  RHS  of 
another  rule;  i.e.,  a  rule  rl  can  execute  another  rule  r2,  on  the  RHS,  as  a  consequent  action. 
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as  follows: 

rule  rl:  IF  (LHS  condition) 

THEN  (EXECUTE  rule  r2  (parameters) ) 
This  is  called  an  "explicit  inference  chain."    Rules  such  as  these  capture  operational 
semantics  corresponding  to  the  WHAT  information  previously  mentioned.    The  support  of 
explicit  inference  chains  by  executing  a  rule  on  the  RHS  is  another  way  in  which  rules 
expressed  in  the  KML  differ  from  productions. 

The  rule  r2  that  is  executed  on  the  RHS  must  always  be  a  value  dependent  rule.  An 
explicit  inference  chain  is  one  way  of  selecting  value  dependent  rules  for  execution  and  this 
method  for  selecting  rules  is  not  generally  supported  by  production  systems.  It  can  support 
both  forward  and  backward  inference  chains  as  will  be  discussed  in  the  next  section. 

Explicit  inference  chains  are  seen  in  PLANNER  [HEW71  and  HEW72]  and  are  con- 
sidered to  reduce  the  independence  of  the  rules  by  embedding  control  information  within  a 
rule.  The  alternative  is  to  use  a  blackboard  of  shared  variables  as  is  common  in  most  pro 
duction  systems  but  this  tends  to  hide  information  about  the  chain  of  execution.  As  an  alter- 
native to  a  blackboard  to  pass  values  between  rules,  values  are  passed  as  parameters  between 
rules  in  an  explicit  inference  chain. 

Before  the  rule  r2  starts  executing  the  abort  flag  corresponding  to  it  must  be  checked  to 
see  if  it  has  been  set.  After  r2  completes  execution  its  status  will  identify  if  the  execution 
was  "successful." 

An  explicit  inference  chain  also  represents  meta-information  about  the  rules;  i.e.,  one 
rule  controls  the  execution  of  another  rule.  This  meta-information  is  useful  in  structuring 
rules  within  object  types,  as  is  discussed  later. 
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4,3   Advantae-Ps  of  Cla.ssifvir.P-  R..1p« 


We  have  seen  that  using  the  KML  constructs  to  express  rules  increases  the  expressive 
power  of  the  rule  language  by  allowing  the  explicit  specification  of  both  declarative  and 
operational  components  of  problem  solving  knowledge.   In  order  to  improve  the  clarity  of  the  '' - 

rules  to  the   user  or  designer  and   their  manageability  by  the  KBMS,  we  require  meta-  .'1 

knowledge  about  these  rules;  classifying  rules  is  one  way  to  meet  this  requirement.  -    ?  ,  { 

Meta-knowledge  obtained  from  classifying  rules  occurs  in  many  forms  and  includes  the  "    ' 

category  of  the  LHS  and  RHS  constructs,  the  distinction  between  declarative  and  operational  ■ 

semantics,  the  options  used  in  conjunction  with  the  LHS  such  as  post-exec,  the  identification        'k 
of  explicit  inference  chains  on  the  RHS,  etc.  '  .     .4 ' 

Meta-knowledge  about  the  category  of  the  LHS  constructs  provides  the  ability  to  di^ 
tinguish  between  value  independent  triggers  and  value  dependent  rules.  This  distinction  will 
assist  the  KBMS  to  select  the  appropriate  subset  of  rules  to  be  processed,  thus  increasing  the 
manageability  of  the  rules.  In  Chapter  Six,  we  describe  the  mechanism  of  applying  rules  in 
the  integrated  KBMS  and  we  identify  the  advantage  of  distinguishing  between  rules,  based 
on  the  LHS  constructs. 

For  example,  the  value  independent  triggers  which  check  the  execution  status  of  opera- 
tions or  rules  on  their  LHS  and  capture  operational  information  are  used  to  modify  transac- 
tions before  execution.  The  value  dependent  rules  which  express  declarative  semantics  and  "  ' 
retrieve  object  occurrences  from  the  knowledge  base  on  their  LHS  are  incorporated  into  the 
transaction  to  be  executed  against  the  object  occurrences.  Being  able  to  distinguish  between 
these  categories  of  rules  improves  the  efficiency  of  the  mechanism  for  applying  rules. 

Meta-knowledge  in  the  form  of  the  different  options,  associated  with  testing  the  execu- 
tion  status  of  operations  and  rules,  is  used  to  schedule  the  execution  of  operations  and  rules         .,    ' 
within  a  transaction.   It  is  also  used  to  determine  when  a  rule  or  operation  is  conditionally 
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executed  or  when  rules  and  operations  can  be  executed  in  parallel.    This  is  useful  when 
optimizing  a  transaction  for  the  efficient  implementation  of  the  KBMS. 

In  MOR84,  it  was  suggested  that  constraints  activated  in  a  chained  fashion  should  be 
grouped  together  to  improve  their  manageability.  In  our  system,  using  meta-knowledge  about 
the  RHS  EXECUTE  constructs,  explicit  inference  chains  are  easily  detected.  Clustering 
rules  that  are  explicitly  chained  together  will  facilitate  the  propagation  of  information  along 
the  inference  chain  via  parameter  passing  as  well  as  be  an  aid  to  efficient  implementation,  as 
will  be  seen  in  Chapter  Seven. 

The  object-oriented  paradigm  provided  structure  to  the  knowledge  by  grouping  data 
and  relevant  rules  within  the  object  types.  Metarknowledge  about  the  different  categories  of 
rules  can  be  used  to  further  structure  these  rules  defined  for  a  single  object  type;  this  struc- 
turing will  reflect  and  exploit  differences  in  applying  these  rules  in  the  KBMS.  This  is  also 
discussed  in  Chapter  Seven. 

One  final  note  is  that  any  rule  can  be  used  for  either  forward  chaining  or  backward 
chaining.  It  is  the  inferencing  mechanism  which  determines  the  direction  of  the  chain  by 
binding  variables  on  the  appropriate  side  of  a  rule  and  selecting  rules  for  execution,  based  on 
their  LHS  conditions  (forward  chain)  or  the  RHS  consequent  (backward  chain). 

Most  deductive  systems  are  characterized  by  one  form  of  inferencing  mechanism;  Pro- 
log uses  a  goal  based  backward  chaining  strategy  while  most  production  systems  are  exam- 
ples of  forward  chaining. 

In  our  KBMS,  we  use  a  transaction  oriented  mechanism  for  applying  rules.  As  described 
in  Chapter  Six,  the  process  of  matching  value  independent  triggers  against  the  triggering 
operations  and  rules  of  the  KBMS  transaction  follows  a  forward  chain  of  inference.  How- 
ever, the  explicit  inference  chains  that  (explicitly)  control  the  execution  of  value  dependent 
rules  support  both  forward  and  backward  chaining  strategies. 
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When  a  rule  rl  executes  another  rule  r2  on  its  RHS,  rl  uses  parameter  passing  to  bind 
variables  in  r2,  as  follows: 
rulerl:   IF  (LHS  condition)  " 

THEN  (EXECUTE  rule  r2  (parameters) ) 
Depending  on  the  direction  of  variable  binding  within  rule  r2,  both  forward  and  back- 
ward inference  chams  can  be  supported.  A  backward  inference  chain  will  use  the  values 
passed  via  the  parameters  to  initially  bind  variables  on  the  RHS  of  rule  r2.  A  forward  infer- 
ence chain  corresponds  to  using  values  passed  via  the  parameters  to  initially  bmd  variables 
on  the  LHS  of  rule  r2.  This  will  be  seen  in  the  examples  discussed  in  the  next  chapter.  The 
desired  direction  of  inference  is  specified  in  these  rules  through  the  appropriate  use  of  vari- 
able names  and  parameter  names.  Consequently,  the  direction  of  inference  is  decided  when 
each  rule  is  specified. 
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CHAPTER  V 
CAPTURING  SEMANTIC  FEATURES  IN  A  KNOWLEDGE  BASE 

In  Chapter  Three  we  described  the  architecture  of  the  integrated,  object-oriented 
KBMS  and  in  Chapter  Four  we  outlined  how  rules  could  be  used  to  capture  declarative  and 
operational  semantics.  In  this  chapter,  we  show,  by  example,  how  semantic  features  found 
useful  in  modeling  knowledge  can  be  captured  by  the  user  object  types  of  the  integrated 
knowledge  base. 

First  we  discuss  semantic  features,  from  both  DBMS  and  AI  literature,  that  have  been 
found  useful  in  modeling  knowledge  from  diverse  domains.  We  introduce  the  system  object 
types  of  our  model,  corresponding  to  the  different  association  types  of  the  underlying  object- 
oriented  semantic  data  model,  SAM*.  We  then  show  examples  of  user  objects  types  that  are 
built  using  the  system  object  types  of  SAM*,  extended  by  a  knowledge  component,  i.e.,  the 
rules  defined  for  each  user  object  type.  The  desirable  semantic  features  are  captured  by 
these  user  object  types  and  their  rules.  Depending  on  the  complexity  of  a  semantic  feature, 
several  system  object  types  and  rules  may  be  needed  to  capture  it. 

5.1    Useful  Semantic  Features  for  Mndplingr  KnowlfidgP 

■  .  }  Semantic  models  containing  a  number  of  general  modeling  constructs  have  been  pro- 
posed in  DBMS  research  [BR081,  CHE76,  COD79,  HAM81,  SCH75,  SMI77,  SU79  and 
SU83].  The  features  captured  in  these  semantic  data  models  parallel  those  captured  in  AI 
knowledge  representation  schemes  [BOB77,  FOX84,  GOL77,  INT84,  MOS83,  MYL81, 
QUI68,  SCH76,  SCH83,  STE83,  SZ077  and  WIN75). 
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The  constructs  in  these  models,  enhancements  to  them,  and  other  constructs  that  are 
supported  by  our  object-oriented  knowledge  representation  model  are  summarized  below. 

An  object  type  representing  an  entity  or  concept  is  described  by  its  attributes.  Object 
types  are  often  used  to  model  complex  hierarchical  structures;  thus,  the  concept  of  describing 
an  object  type  by  its  attributes  must  be  extended  to  handle  attributes  that  are  themselves 
object  types.  This  could  lead  to  recursively  defined  object  types  where  an  attribute  of  an 
object  type  has  the  same  definition  as  the  object  type  itself. 

Next,  generalization  or  "is-a"  hierarchies  model  classes  of  object  types  and  their  sub- 
classes and  are  capable  of  supporting  the  inheritance  of  attributes  and  operations,  along  the 
hierarchy.  This  feature  must  be  extended  to  support  the  inheritance  of  rules  as  well.  The 
hierarchies  must  also  be  able  to  handle  multiple  inheritance  when  the  hierarchy  represents  a 
graph  rather  than  a  tree.   This  is  an  extension  of  the  strict  hierarchies  of  SmallTalk-80. 

The  "is-a-part-of"  or  component  relationship  must  be  extended  to  include  complex 
object  types  whose  components  are  also  complex  object  types.  This,  too,  could  lead  to  recur- 
sively defined  object  types.  The  ability  to  group  objects  into  a  class  and  to  identify  attri- 
butes of  the  class  is  used  to  represent  summary  functions  such  as  the  maximum,  average, 
etc.,  for  all  members  of  the  class.  Another  useful  feature  is  the  ability  to  model  facts  or 
events  that  result  from  the  interactions  of  the  occurrences  of  two  (or  more)  independent 
object  types  as  well  as  the  attributes  that  describe  these  interactions.  It  is  also  useful  to 
identify  properties  such  as  transitivity,  symmetry,  reflexivity,  etc.,  to  describe  relationships 
between  object  types  and  occurrences. 

The  membership  or  "as-a"  relationship  is  used  to  distinguish  between  an  object  type 
and  the  individual  object  occurrences  (or  instances)  of  that  type.  This  feature  must  be 
extended  in  order  to  associate  knowledge  with  individual  object  occurrences. 

Useful  problem  solving  knowledge  that  is  captured  in  both  DBMS  and  AI  systems  are 
often  in  the  form  of  constraints  or  rules.  This  includes  integrity  and  security  constraints 
common  in  DBMS  and  AI  truth  maintenance  systems,  deductive  rules  that  describe  how  new 
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facts  can  be  derived,  often  through  the  use  of  properties  such  as  transitivity,  and  expert 
problem  solving  rules  defined  for  an  application  domain.  This  knowledge,  too,  must  be 
represented  within  the  object  types  of  the  integrated  knowledge  base. 

The  knowledge  component,  in  the  form  of  rules  which  are  also  objects,  may  be  defined 
for  a  system  object  type,  a  user  object  type  or  for  an  occurrence  of  a  user  object  type.  A  rule 
defined  for  a  system  object  type  captures  semantics  which  are  inherent  to  the  semantic 
model;  it  will  be  inherited  by  user  object  types  built  using  the  system  object  type.  A  rule 
defined  for  a  user  object  type  (or  one  of  its  occurrences)  captures  domain  semantics  relevant 
to  a  particular  application  domain.  Related  work  on  this  subject  is  described  in  RAS85. 
Note  that  the  terms  SAM*  association  type  and  system  object  type  will  be  used  interchange- 
ably in  this  chapter. 

5.2   Renresenting  the  Attrihntps  of  an  Object,  Typp 

An  object  type  is  usually  defined  by  its  attributes  or  characteristics;  such  a  grouping  of 
a  set  of  attributes  to  represent  an  object  type  must  be  captured  in  the  knowledge  base.  In 
Figure  5.1,  PART_DEF  is  a  user  object  type  representing  a  collection  of  parts  and  is 
described  by  its  attributes  PARTJ^AME,  PART_DESC,  etc.  PART_DEF  will  be  modeled 
by  an  aggregation  (A)  association  type  (system  object  type).  The  A  association  type  of 
SAM*  is  a  system  object  type  that  defines  an  object  type  by  a  set  of  characteristic  attributes, 
each  of  which  is  represented  by  an  attribute  type.  Each  occurrence  of  this  object  type  is  a 
member  of  the  cross  product  of  the  domains  of  the  attribute  types  involved  and  is 
represented  by  the  values  of  its  attributes. 

The  domain  of  an  attribute  can  be  defined  by  a  membership  (M)  association  type  which 
is  used  to  group  together  similar  atomic  concepts;  it  is  formed  by  a  set  of  distinct  elements  of 
the  same  data  type.  The  data  type  can  be  a  simple  data  type,  e.g.,  integer,  a  complex  data 
type,  e.g.,  vector,  or  an  abstract  data  type,  e.g.,  RULE. 
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In  Figure  5.1,  M  association  types  are  used  to  define  the  domains  of  the  attributes  of 
the  user  object  type  PARTJ3EF.  These  domains  are  PART-NAME  and  PART_DESC 
whose  data  type  is  string,  QTY_JN_5TOCK  of  data  type  integer,  PROD_COST  and 
MARK-UP  whose  data  type  is  real  and  ORDER-JROC  whose  data  type  is  RULE.  P*S_P 
is  another  user  object  type  representing  a  part-subpart  relationship.  It  is  also  modeled  by  an 
A  association  type  with  attributes  PART  and  SUB-PARTS. 

In  the  figures  in  this  section,  a  network  representation  using  labelled  nodes  and  arcs  is 
used.  A  labelled  node  is  a  concept  defined  in  terms  of  other  concepts  pointed  to  by  the 
directed  arcs  leading  from  the  node.  Attributes  are  represented  by  arcs  whose  domains  are 
pointed  to  by  the  arrow.  A  labelled  arc  represents  an  attribute  whose  name  is  different  from 
the  domain  name.  PART  and  SUB_PARTS  have  the  same  underlymg  domain 
PART_NAME.  The  data  type  of  the  attribute  may  be  different  from  the  domain  data  type, 
e.g.,  SUB-PARTS  is  a  set  of  values  obtained  from  the  domain  PART-NAME. 

The  knowledge  component  of  user  object  types  PART-DEF  and  P*S-P  is  in  the  form 
of  rules  that  are  either  inherited  from  the  system  object  types  or  are  specified  for  the  user 
object  type.    Rules  that  maintain  the  semantics  of  the  system  object  type  used  to  model  the 
user  object  type  are  inherited.    For  example,  in  addition  to  the  object  identification,  OID, 
which  is  used  to  distinguish  all  objects  in  the  knowledge  base,  the  A  association  type  defines 
a  uniqueness  property  over  one  (or  more)  attributes;  these  attributes  must  have  distinct 
values  for  each  occurrence  of  the  type.    The  user  object  type  PART-DEF,  modeled  by  an  A 
association  object  type,  inherits  a  uniqueness  constraint,  say  on  the  attribute  PART-NAME. 
The  user  object  type  PART-DEF  inherits  a  rule,  unique_PART-NAME,  which  checks 
each    occurrence    of    PART-DEF    for    uniqueness    of    attribute    PART_NAA4E.     Rule 
unique_PART-NAME    is    executed    when    an    occurrence    is    inserted    into    object    type 
PART-DEF.     PART-DEF    also    inherits    a    rule    Tl    which    triggers    the    execution    of 
unique-PARTJMAME  when  there  is  a  corresponding  INSERT  operation.    Since  there  is  a 
possibility  that  rule  unique_P ART-NAME  can  abort  the  insertion  into  PART-DEF  (when 
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the  constraint  is  not  met),  this  rule  must  be  applied  before  the  insertion  is  executed  by  the 
KBMS.  We  use  the  pre-exec  option  with  the  INSERT  operation  in  rule  Tl  to  provide  the 
KBMS  with  this  scheduling  information. 

A  rule  such  as  Tl,  that  tests  the  execution  status  of  a  KML  (INSERT)  operation  exe- 
cuting against  an  object  type  (PART_DEF),  is  a  value  independent  trigger  as  described  in 
Chapter  Four.  Rule  unique_PART^AME  is  a  value  dependent  rule  which  is  explicitly 
selected  for  execution  by  trigger  Tl.  In  this  explicit  inference  chain,  Tl  passes  the  value,  X, 
of  the  inserted  occurrence  of  PART_DEF  as  a  parameter  into  unique_PARTJSfAME. 

Note  that  insertion  into  PART_DEF  also  affects  the  attribute  QTY_IN_STOCK  and 
its  domain  which  is  modeled  by  an  M  association  type.  This  domain  may  be  subject  to  user- 
defined  membership  constraints  specified  by  a  range,  or  an  enumeration,  of  acceptable  values. 
In  this  application,  the  desired  constraint  is  that  the  the  attribute  values  of 
QTY_IN_STOCK  for  each  occurrence  of  PART_DEF  must  not  exceed  a  maximum  value  of 
10,000.  A  value  dependent  rule  qty_constr_l,  defined  for  QTY_IN_5TOCK,  maintains  this 
constraint  and  is  executed  by  the  value  independent  trigger  Tl',  when  an  insertion  into 
PART_DEF  is  attempted.  The  rules  Tl,  Tl'  and  unique_PARTJ^AME,  defined  for 
PART_DEF,  and  qty_constr_l,  defined  for  QTYJN^TOCK,  are  as  follows: 

Tl  :  PART_DEF 

IF  pre-exec(INSERT  an  occurrence  X  into  PART_DEF) 

THEN  (EXECUTE  unique_PARTJVAME(X)  :  PART_DEF) 

unique_PART3IAME(X)  :  PART_DEF 

IF  (for  an  occurrence  X  of  PART_DEF,  there  exists  an  occurrence  Y  of  PART_DEF 
such  that  EQUAL(X.PART_NAME,  Y.PART_NAME)  AND  NOT_EOUALfX  Y)  ) 
THEN  (alert  the  KBMS  to  reject  X)  ^    '    '  ' 

Tl'  :  PART_DEF 

IF  pre-exec(INSERT  an  occurrence  X  into  PARTJDEF) 

THEN  (EXECUTE  qty^onstr_l(X.QTY_IN^TOCK)  :  QTYJN^TOCK) 

qty^onstr_l(X)  :  QTYJN^TOCK 

IF  (for  an  occurrence  X  of  QTY_IN^TOCK,  GREATER_THAN(X  lOOOOl  ) 

THEN  (alert  the  KBMS  to  reject  X) 


■"   ,  ■•'  '.  .  ■"'■■.  \>«  ■  .      ■      "  .  ■;  ■       •  "  '-'■    " 

;:      The   user  object   type   PART_DEF   may   be   subject   to  inter-occurrence   and   intra- 
occurrence  constraints.   Tliis  is  application  specific  knowledge  which  only  applies  to  a  partic- 
ular user  object  type.   For  example,  the  user  may  wish  to  limit  profits  by  maintaining  a  limit 
on  the  attribute  MARK-UP  at  a  maximum  of  50  percent  of  the  PROD_COST.    A  value         ,    ■• 
dependent   rule,   lim_prof_l,   maintains  this  intra-occurrence   constraint   by  executing  an     ,.    .  . 

r;  UPDATE  operation  when  the  constraint  is  violated  by  an  insertion  into  PART_DEF.    It  is     , 

r^  executed  by  a  trigger  T2.    Based  on  its  RHS,  lim_profLl  is  classified  as  a  rule  that  executes 

an  action  while  T2  is  a  rule  that  builds  an  explicit  inference  chain.   Both  rules  are  defined  by 

■^f  the  user  for  the  object  type  PART_DEF.  This  expert  rule  lim_prof_l,  unlike 
unique-PARTJNTAME,  does  not  abort  the  insertion  into  PART_DEF  and  may  be  scheduled 
for  execution  after  the  insertion.  We  use  the  post-exec  option  with  the  INSERT  operation  in 
trigger  T2. 

T2:PART_DEF  ;.      ,   - 

IF  post-exec(INSERT  occurrence  X  into  PART_DEF)  - 

THEN  (EXECUTE  lim_profLl(X)  :  PART_DEF)  "     ;.     ;  ;  :  v,  - 

lim_profLl(X)  :  PART_DEF  ; : '^^ ' 

IF  (for  an  occurrence  X  of  PART_DEF,  -  ' 

GREATER_J^HAN  (X.MARK_UP,  PRODUCT  (0.5,  X.PROD_COST))  )       ■■:';'-' 
THEN  (UPDATE  X  such  that  X.MARK_UP  =  -■ 

PRODUCT  (0.5,  X.PROD_COST)  ) 

As  mentioned  in  Chapter  Three,  rules  can  be  defined  for  an  object  type;  i.e.,  it  applies 
to  all  occurrences  of  the  object  type,  or  it  can  be  defined  for  a  particular  occurrence  of  an 
object  type.  In  the  next  example,  each  occurrence  of  PART_DEF  has  a  diff"erent  rule  associ- 
ated with  it.  This  is  modeled  by  an  attribute  ORDER-RULE  of  object  type  PART_DEF; 
this  attribute  is  of  abstract  data  type,  RULE.  Then,  for  each  occurrence  of  PART_DEF, 
the  value  of  this  attribute  is  a  rule  associated  with  it. 

For  all  occurrences  of  PART_DEF  whose  attribute  PART_DESC  has  a  value  "military 
equipment",  the  rule  ORDER-RULE  must  be  executed,  after  the  corresponding 
PART_DEF  occurrences  are  retrieved.    This  information  is  captured  by  trigger  T3  which 
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uses  the  post-exec  option  to  execute  rule  milit_constr_l.  Rule  milit^onstr_l  is  a  value 
dependent  rule  which  checks  the  value  of  attribute  PART_DESC  of  each  retrieved 
occurrence  of  PART_DEF.  If  satisfied,  milit^onstr_l  executes  the  rule  stored  as  a  value  of 
ORDER_RULE  with  that  occurrence  of  PART_DEF.  The  rules  are  as  follows: 

T3  :  PART_DEF 

IF  post-exec(RETRIEVE  an  occurrence  X  of  PARTJDEF) 
THEN  (EXECUTE  milit_x;onstr_l(X)  :  PART_DEF) 

milit_constr_l(X)  :  PART_DEF 

IF  (for  an  occurrence  X  of  PART_DEF 

EQUAL  (X.PART_DESC,  "military  equipment")  ) 
THEN  (EXECUTE  X.ORDER_RULE) 

an  example  value  of  ORDER_RULE  ■ 
IF  (GREATER_THAN  (PROD_COST,  1500)  AND 
LESS_THAN  (QTY_JN^TOCK,  10)  ) 
THEN  (alert  the  KBMS) 

Currently,  there  are  certain  restrictions  imposed  on  rules  which  are  attributes  of  type 
RULE.  One  restriction  is  that  the  rules  should  be  value  dependent  rules;  i.e.,  they  may  not 
test  the  execution  status  of  operations  or  rules  executed  against  other  objects  in  the 
knowledge  base.  The  conditions  that  they  test  on  the  LHS  should  only  involve  the  particular 
object  occurrence  for  which  the  rule  is  specified.  This  restriction  makes  the  rule  sensitive 
only  to  the  attribute  values  of  the  occurrence  for  which  it  is  specified. 

The  RHS  consequent  action  is  restricted  to  execute  against  the  particular  occurrence 
for  which  the  rule  is  specified.  The  RHS  action  could  also  be  part  of  an  explicit  inference 
chain  by  executing  another  value  dependent  rule.  However,  the  rule  in  the  explicit  inference 
chain  must  also  be  an  attribute  of  type  RULE,  specified  for  the  same  object  occurrence. 

As  a  result  of  these  restrictions,  the  scope  of  the  knowledge  rule  is  restricted  to  the 
occurrence  for  which  it  is  defined.  Conceptually,  there  is  no  necessity  to  impose  these  res- 
trictions. The  reason  we  currently  impose  these  scoping  restrictions  is  to  simplify  the  KBMS 
prototype;  this  will  be  discussed  with  strategies  for  applying  rules,  in  Chapters  Six  and 
Seven.  Finally,  since  these  rules  are  actually  attributes  of  an  object  occurrence,  they  do  not 
have  an  execution  status  associated  with  them.    Value  independent  rules  may  not  test  their 
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execution  status  or  modify  transactions  that  execute  these  rules.    This,  too,  simplifies  the      " 
prototype  of  the  KBMS.  .. 

5.3    Modflinp-  the  Use  of  PropPrtJP^  .....ji  as  Tr^n^it.ivjfy 

The  user  may  wish  to  include  knowledge  that  makes  use  of  properties  such  as  transi- 
tivity. For  example,  if  P*S_P  is  an  object  type  that  captures  the  part-subpart  relationship, 
then  the  transitive  closure  of  this  relationship  will  derive  all  the  subparts  of  a  part.  All  the 
occurrences  derived  using  the  transitivity  property  will  be  stored  in  an  object  type 
der_P*S_P.  The  object  type  der_P*S_P  is  also  modeled  by  an  A  association  type  and  is 
similar  in  structure  to  P*S_P.  We  use  separate  object  types  so  as  to  differentiate  between 
permanent  occurrences  of  P*S_P,  which  are  directly  inserted,  and  the  temporary  occurrences 
of  der_P*S_P  (its  transitive  closure),  which  are  derived  and  whose  existence  depends  on  cer-' 
tain  conditions  which  may  not  always  be  true  in  the  knowledge  base. 

Value  dependent  rules  trans^Ll  and  trans^L2,  defined  for  the  derived  object  type 
der_P*S_P,  define  the  transitive  closure  of  P*S_P.  Based  on  their  RHS,  they  are  classified 
as  deductive  rules  that  derive  new  information.    The  two  rules  are  executed  by  triggers  T4 

and  T4',  respectively,  when  occurrences  of  derJP*S_P  are  retrieved. 

T4:derJP*S_P 

IF  par-exec(RETRIEVE  occurrence  X  from  der_P*S_P) 
THEN  (EXECUTE  trans_x;U(X)  :  der_P*S_P) 

trans^Ll(Z)  :  der_P*S_P 

IF  (there  exists  an  occurrence  Y  of  P*SJP) 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  such  that  Z  =  Y) 

T4' :  der_P*S_P  ,  ' 

IF  par-exec(RETRIEVE  occurrence  X  from  der_P*S_P) 

THEN  (EXECUTE  trans^l_2(X)  :  der_P*S-P)  ■' 


44 


trans^l_2(Z)  :  der_P*S_P 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  where  Z.PART  =  P  PART 

AND  Z.SUB_PARTS  -  the  union  of  P.SUB_PARTS  and  Q.SUB_PARTS) 

This  chain  of  inference,  corresponding  to  triggers  T4  and  T4'  executing  value  depen- 
dent rules  trans_x;Ll  and  trans_cL2,  ,s  an  example  of  a  backward  chain.  In  a  backward 
chain,  the  goal  or  triggering  operation  (the  retrieval  operation),  initially  binds  occurrences  on 
the  RHS  of  a  rule.  These  bindings  are  then  passed  to  the  LHS  of  that  rule.  In  the  example, 
triggers  T4  and  T4'  pass  the  attributes  of  the  X  occurrences  (to  be  retrieved)  as  parameters 
into  tran^Ll  and  trans^L^.  There,  they  first  bind  with  the  attributes  of  the  Z 
occurrences  on  the  RHS  of  these  rules. 

We  use  the  special  knowledge  manipulation  operation  DERIVE  to  indicate  that  these 
new  occurrences  are  generated  through  the  application  of  a  deductive  rule.  Derived  data  are 
not  usually  a  permanent  part  of  the  knowledge  base  and  are  generated  when  required.  Since 
we  specify  the  par-exec  option  on  the  LHS  of  the  triggers  T4  or  T4',  the  rules,  trans_cU 
and  tran^L^  can  be  applied  in  parallel  with  the  retrieval  operation. 

Transitive  closure  exhibits  recursive  properties,  as  is  seen  in  this  example.  The  trigger- 
ing retrieval  operation  against  der_P*S_P  causes  the  execution  of  trans-xL2.  In  trying  to 
satisfy  its  LHS,  rule  tran^L^  will  attempt  to  retrieve  appropriate  P  occurrences;  this 
corresponds  to  the  execution  of  another  retrieval  operation  against  der_P*S_P.  Triggers  T4 
and  T4'  will  match  against  this  new  retrieval  operation  and  will  cause  rules  trans_cl_l  and 
trans_cl_2  to  be  recursively  executed. 

The  same  rules  tran&_cU  and  tran^L^  can  also  be  used  to  compute  the  transitive 
closure  of  P*S_P  via  a  forward  chain  of  inference.  To  support  a  forward  chain,  the  triggers 
must  be  changed  to  execute  the  rules  when  new  occurrences  are  inserted  into  P*S_P  or 
derived  into  der_P*S_P.  Now,  the  new  occurrences  will  initially  bind  attributes  on  the  LHS 
of  the  rules. 
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The  detection  of  recursive  rules  and  evaluation  strategies  (when  there  are  a  large 
number  of  occurrences  that  can  satisfy  the  rules,  thus,  requiring  efficient  search  strategies) 
ha.  been  studied  [BAY85,  CHA81,  HEN84,  ST084,  ULL85  and  WON841  and  is  still  an  open 
topic  for  research.  Chapter  Eight  of  this  dissertation  deals  with  the  efficient  evaluation  of 
linear  recursive  rules. 

5.4    Grouping  Obierts  into  nfnpr^liv^tion  "k.A"  Hierarchies 

Similar  objects  can  be  grouped  together  to  form  a  generic  type.  The  generic  object 
type  and  its  constituent  object  types  will  form  a  generalization  hierarchy.  In  the  example  of 
Figure  5.2,  GOVTJ>ROJECT  is  a  generic  object  type  of  all  government  projects  and  it 
groups  together  all  members  of  its  three  constituent  object  types,  NON_MILITARY_PROJ 
MILITARY_PROJ  and  TOP^ECRET_PROJ.  An  occurrence  of  a  constituent  object  type 
is  also  a  member  of  the  generic  object  type  GOVT-PROJECT. 

In  our  model,  we  use  a  generalization  (G)  a^ciation  type  to  model  a  generic  object 
type.  Thus,  GOVT_PROJECT  is  modeled  by  a  G  a^ciation  type.  Its  three  constituents 
are  modeled  by  A  a^ciation  types.  The  constituent  object  types  of  a  generic  object  type 
could  be  dissimilar,  i.e.,  the  constituent  object  types  could  have  different  attributes,  rules, 
etc.,  to  describe  them.  The  occurrences  of  a  user  object  type  modeled  by  a  G  a^ociation 
type  are  obtained  by  taking  the  outerjoin  of  the  occurrences  of  its  constituents.  The  generic 
object  type  simply  groups  together  existing  object  occurrences  but  does  not  create  any  new 
object  occurrences.  Thus,  the  set  of  OID  values  for  GOVT_PROJECT  is  formed  by  the 
union  of  the  OID  values  of  its  constituents. 

Although  the  generalization  we  have  described  is  a  strict  hierarchy  of  user  object  types, 
the  occurrences  themselves  may  not  be  hierarchically  organized.  For  example,  an  occurrence 
of  TOP_SECRET_PROJ  could  also  be  an  occurrence  of  MILITARY_PROJ.  To  model  this, 
four  types  of  constraints,  set  exclusion,  set  equality,  set  subset  and  set  intersection  can  be 
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specified  for  the  generic  object  type.  These  constraints  specify  the  set  relationship  between 
the  occurrences  of  a  pair  of  constituent  object  types.  The  constraint  is  based  on  the  OID 
values  of  these  occurrences. 

The  set  constraints  between  the  constituents  of  GOVT_PROJECT  are  shown  in  Fig- 
ure 5.2.  For  any  generic  object  type,  the  specific  set  constraints  are  application  domain 
knowledge;  i.e.,  corresponding  rules  will  not  be  inherited  but  must  be  defined  for  each  user 
object  type.  In  this  application,  there  is  a  set  exclusion  constraint  between  occurrences  of 
MILITARY_PROJ  and  NON_MILITARY_PROJ  while  occurrences  of 
TOP_SECRET_PROJ  form  a  subset  of  MILITARY_PROJ.  The  generic  object  type 
GOVT_PROJECT  must  be  defined  by  rules  that  check  for  the  appropriate  set  constraints. 
These  rules  will  be  triggered  when  occurrences  are  inserted  into  the  constituent  object  types. 

GOVT-PROJECT  has  a  value  dependent  rule  gen^constr-l  which  ensures  set  exclu- 
sion; i.e.,  an  occurrence  that  is  inserted  into  TOP_SECRET_PROJ  should  not  previously 
have  been  inserted  into  NON_MILITARY_PROJ.  Trigger  T5,  defined  for 
TOP_SECRET_PROJ,  executes  gen^constr_l  when  there  is  an  insertion  into 
TOPjSECRET-PROJ.  Since  gen^constr_l  can  abort  the  insertion  into 
TOP^ECRET-PROJ,  T5  uses  the  pre-exec  option  with  the  INSERT  operation. 

A   rule   genjconstr_2,    also   defined   for   GOVT-PROJECT,    inserts   a   corresponding 

occurrence  into  MILITARY_PROJ  to  satisfy  the  set  subset  constraint  (if  it  is  not  already 

satisfied).    This  occurrence  will  have  the  same  OID  value  as  the  inserted  occurrence  of 

TOP_SECRET_PROJ.   Attribute  values  will  be  obtained  from  the  user  or  set  to  null  values. 

Rule  gen^constr_2  can  be  executed  in  parallel  with  the  insertion  into  TOP_SECRETJPROJ. 

A  trigger,  T6,  defined  for  TOP_SECRET_PROJ,  executes  gen^onstr_2  using  the  par-exec 

option.   The  rules  are  as  follows: 

T5  :  TOP_SECRET_PROJ 

IF  pre-exec(INSERT  an  occurrence  X  into  TOP^ECRET_PROJ) 
THEN  (EXECUTE  gen^onstr_l(X)  :  GOVT-PROJECT) 
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gen_constr_l(X)  :  GOVT_PROJECT 

IF  (for  an  inserted  occurrence  X  of  TOP^ECRET_PROJ,  there  exists 

'°  THFnTT  y  ?l  NON^LITARYJ^RO J,  such  that  EQUAL(X.OID,  Y.OID) ) 
lHEN(alert  the  KBMS  to  reject  occurrence  X) 

T6  :  TOP^ECRET_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  TOP_SECRET_PROJ 
THEN  (EXECUTE  gen_constr_2(X)  :  GOVT_PROJECT) 

gen_constr_2(X)  :  GOVT_PROJECT 
IF  (for  an  inserted  occurrence  X  of  TOP^ECRET_PROJ 
NOT_SET^lEMBER(X.OID,MILITARY_PROJ.OIDJ  ) 

THEN  (INSERT  an  occurrence  Z  into  MILITARY_PROJ  where  Z.OID  =X.OID  ) 

-     In  the  generalization  hierarchy,  each  member  of  a  constituent  object  type  is  also  a 

member  of  the  generic  object  type.    Thus,  when  an  occurrence  of  TOP^ECRET_PROJ  is 

inserted,  a  corresponding  occurrence  of  GOVT_PROJECT  must  be  inserted  (if  it  does  not 

already  exist).   This  knowledge  is  inherent  to  the  knowledge  representation  model;  rules  that 

support  this  are  inherited  by  user  object  types  modeled  by  G  association  types.    A  rule 

gen_hier_l    inherited    by    GOVT_PROJECT    maintains    this    hierarchy    by    insertmg    a 

corresponding  occurrence  mto  GOVT_PROJECT,  when  necessary.    A  value  independent 

trigger  T7,  also  inherited  by  TOP^ECRET_PROJ,  executes  gen_Jiier_l  m  parallel  with  an 

insertion  into  TOP^ECRET_PROJ.   They  are  as  follows: 

T7  :  TOP^ECRET_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  TOP.^ECRET_PROJl 
THEN  (EXECUTE  gen_hier_l(X)  :  GOVT_PROJECT) 

gen_hier_l(X)  :  GOVT_PROJECT 

IF  (for  occurrence  X  inserted  into  a  component  of  GOVT_PROJECT 
NOT^ETaiEMBER(X.OID,  GOVT_PROJECT.OID) ) 

THEN  (INSERT  an  occurrence  P  into  GOVT_PROJECT  where  P.OID  =  X.OID  ) 

5.5  Attribute  Inheritt^nre  in  t,hp  npnprali7^f.inn  Hi^r^r^i^y 

Attributes  that  describe  a  generic  object  type  are  inherited  by  the  occurrences  of  the 
constituent  object  types.    This  semantic  feature  is  called  "attribute  inheritance".   For  exam-  : 
pie,  two  attributes,  LOCATION  and  STATUS,  describe  all  occurrences  of  the  generic  object  '> 
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type  GOVTJ>ROJECT  and  are  inherited  by  the  occurrences  of  its  three  constituent  object 
types. 

Our  model  supports  the  inheritance  of  descriptive  attributes  defined  for  a  generic  user 
object  type.  Rules  to  support  attribute  inheritance  will  be  inherited  by  these  user  object 
types. 

In  the  previous  section,  a  rule  gen_hier_l  inserts  an  occurrence  into  the  generic  object 

type  GOVT_PROJECT,  whenever  an  occurrence  is  inserted  into  TOP_SECRET_PROJ. 

Before  this  insertion  into  GOVT_PROJECT  is  executed,  values  for  the  inherited  attributes 

must    be    obtained    from    the    user.     A    value    independent    trigger    T8,    inherited    by 

GOVT_PROJECT,  will  obtain  the  values  of  LOCATION  and  STATUS  from  the  user  (for 

the    corresponding    occurrence    of    TOP^ECRET_PROJ),     before    the    insertion     into 

GOVT_PROJECT.   The  trigger  T8,  is  as  follows: 

T8  :  GOVT_PROJECT 

IF  pre-exec(INSERT  an  occurrence  X  into  GOVTJPROJECT) 

THEN  (obtain  values  for  attributes  X.LOCATION  and  X.STATUS  from  user) 

The    occurrences  of  GOVT_PROJECT  and  its  constituents  may  also  be  subject  to 

application  specific  knowledge  defined  by  the  user.    There  may  be  a  constraint  specifying 

that  the  attribute  LOCATION  of  all  occurrences  of  TOP^ECRET_PROJ  must  have  a 

value  "Virginia"  if  the  attribute  STATUS  ha^  a  value  "testing".    A  value  dependent  rule 

loc^tat_l,  defined  for  TOP_SECRET_PROJ,  tests  the  occurrence  of  GOVT_PROJECT 

corresponding  to  an  occurrence  of  TOP_SECRET_PROJ,  for  this  constraint.   Note  that  the 

constraint  tests  the  values  of  inherited  attributes.    Rule  loc^tat_l  updates  the  occurrence  of 

GOVT_PROJECT,   if  necessary,   to  maintain  this  constraint.    A  trigger  T9,  defined  for 

TOP^ECRET_PROJ,       executes       rule       loc^tat_l        after       an       insertion       into 

TOP^ECRETJ>ROJ.   Rules  loc^tat_l  and  T9  are  as  follows: 

T9  :  TOP_SECRET_PROJ 

IF  post-exec(INSERT  occurrence  X  into  TOPjSECRET_PROJ) 
THEN  (EXECUTE  loc^tat_l(X)  :  GOVT_PROJECT) 
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loc^tat_l(X)  :  GOVT_PROJECT 

IF  (for  an  inserted  occurrence  X  of  TOP_SECRET_PROJ 

AND  E^:LirSTATh%yLZ'lfm'^'^'  ''''  ''"  ^^"^^P'O".  ^  ^^) 
NOT_EQUAL(Y.LOCATION,  "Virginia")  ) 

THEN  (UPDATE  Y  such  that  Y.LOCATION  =  "Virginia") 

The  G  association  system  object  type  can  also  be  used  to  model  the  concept  of  speciali- 
zation. Instead  of  using  the  hierarchy  to  group  similar  constituent  object  types  into  the  gen- 
eric object  type,  the  generic  object  type  is  specialized  into  different  constituent  object  types. 
In  this  case,  all  occurrences  of  the  generic  object  type  need  not  necessarily  belong  to  some 
constituent  object  type  and  occurrences  can  be  directly  inserted  into  the  generic  object  type. 
Different  rules  will  support  the  specialization  concept. 
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Complex  objects  are  described  by  attributes  which  are  themselves  occurrences  of  other 
object  types.  For  example,  in  Figure  5.3,  an  object  type  WORK_STATION  is  described  by 
several  attributes,  one  of  which  is  a  set  of  occurrences  of  another  object  type  WK_5T_OPS, 
representing  the  operations  that  can  be  performed  by  WORK_^TATION.  In  our  model, 
complex  user  object  types  are  modeled  by  a  combination  of  other  object  types;  in  this 
instance  an  aggregation  (A)  hierarchy  is  used. 

WORKSTATION  is  modeled  by  an  A  association  type  with  attributes  WK_JST_ID 
and  WKST_OPS.  The  attribute  WKST_OPS  is  modeled  as  a  set  of  occurrences  of 
another  user  object  type  WKST_OP  which,  in  turn,  is  modeled  by  an  A  association  type, 
with  two  attributes  OPERATION  and  OP_TIME. 

A  complex  object  type  can  also  represent  a  set  of  facts  or  events  which  result  from  the 
interaction  among  occurrences  of  independent  object  types.  In  Figure  5.3,  PROD_JOB  and 
WORK-STATION  are  two  user  object  types  representing  a  collection  of  production  jobs 
and  work  stations,  respectively.  An  interaction  is  said  to  exist  between  occurrences  of  these 
two  object   types  whenever   a  production  job  can   be   executed  on   a  work  station-   this 
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interaction  is  represented  by  another  user  object  type  WIC-ST_TASK  which  captures  the 
attributes  describing  the  interaction.  These  attributes  are  OPERATION,  the  operation  that 
the  work  station  can  perform  on  the  production  job,  and  OP-TD^ffi,  the  time  to  execute  this 
operation. 

The  system  object  type  that  models  an  interaction  is  the  interaction  (I)  association 
type.  Thus,  the  user  object  type  WKJSTJTASK  is  modeled  by  an  I  association  type.  Each 
interaction  occurrence  of  WKjST_TASK  is  identified  by  an  OID  value  and  has  two  attri- 
butes corresponding  to  the  OID  values  of  the  interacting  occurrences.  For  example, 
WK_ST_TASK  has  attributes  PROD_JOB.OID  and  WORK_STATION.OID  to  store  the 
OID  values  of  the  occurrences  of  PROD_JOB  and  WORK-STATION. 

We  already  discussed  modeling  the  complex  user  object  type  WORKSTATION.  The 
user  object  type  PROD_JOB  is  modeled  by  an  A  association  as  seen  in  Figure  5.3;  it  has 
attributes  JOB_ID  and  OPERATION  (which  has  the  same  domain  as 
WORKSTATION.WK_ST_OPS.OPERATION). 

There  are  a  number  of  constraints  that  can  be  defined  for  an  I  association  type;  they 
include  uniqueness  constraints,  intra-occurrence  and  inter-occurrence  constraints  similar  to 
those  defined  for  an  A  association  type,  and  a  mapping  constraint  (1-1,  1-n  and  n-m)  to 
describe  the  relationship  between  the  interacting  occurrences.  These  constraints  capture 
domain  specific  knowledge  about  the  interaction. 

Rules  can  also  be  used  to  capture  domain  specific  knowledge  about  the  actual  condi- 
tions under  which  an  interaction  may  occur.  These  rules  are  not  inherited  by  the  user  object 
type  but  must  be  specified  by  the  user.  The  user  object  type  WK_ST_TASK  may  be  defined 
by  rules  that  automatically  derive  (or  delete)  occurrences  of  WK_5T_TASK  in  response  to 
changes  to  the  underlying  object  types  that  interact  with  each  other.  This  follows  a  forward 
chain  of  inference. 

In  contrast,  rules  specified  for  WKST_TASK  may  derive  its  occurrences  whenever 
there  is  a  retrieval  request.    This  would  correspond  to  a  backward  inference  chain.    For 
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example,  a  value  dependent  rule  inter_ruLl  examines  the  object  occurrences  of  PROD_JOB 
and  WORK-STATION  for  possible  interactions.  The  rule  identifies  operations  that  the 
work  station  can  perform  on  the  production  job  and  derives  corresponding  interaction 
occurrences.  Based  on  its  RHS,  inter_rul_l  is  a  deductive  rule  that  derives  new  occurrences. 
Since  these  derived  occurrences  are  temporary;  i.e.,  their  existence  depends  on  conditions 
specified  on  the  LHS  of  inter_rul_l,  they  are  stored  as  occurrences  of  der_WK_ST_TASK. 
The  temporary  user  object  type  der_WK^T_TASK  has  the  same  structure  as 
WK^T_TASK. 

A  value  independent  trigger  TlO,  defined  for  der_WK_ST_TASK,  executes  inter_rul_l 
whenever  occurrences  of  der_WK_ST_TASK  are  retrieved.  This  is  a  backward  inference 
chain;  TlO  initially  binds  variables  on  the  RHS  of  inter_rul_l. 

TlO  :  der_WK_^T_TASK 

IF  par-exec(RETRIEVE  occurrence  X  from  der_WK^T_TASK) 
THEN  (EXECUTE  inter_rul_l(X)  :  der_WK^T_TASK) 

inter_ruU(Z)  :  der_WK_ST_TASK 

IF  (there  exist  occurrences  X  and  Y  of  PROD_JOB  and  WORKJSTATION 
such  that  SET_MEMBER(X.OPERATION,  Y.WK_ST_OPS.OPERATi6n)  ) 
THEN  (DERIVE  an  occurrence  Z  into  der_WK_ST_TASK 

where  Z.PROD_JOB.OID  =  X.OID  and  Z.WORK^TATION.OID  =  Y.OID) 

WK-^T-TASK  is  described  by  two  attributes  OPERATION  and  OP_TIME.    The 

values   of   these    attributes   can    also   be   derived   from    the   corresponding  occurrences  of 

PROD  JOB    and    WORKSTATION,    by    rule    inter_ruU.     The    RHS    consequent    of 

inter_ruLl  will  now  be  as  follows; 

...(DERIVE  an  occurrence  Z  into  der_WK_ST_TASK 

where  Z.PROD_JOB.OID  =  X.OID  and  Z.WORKSTATION.OID  =  Y  OID 
and  Z.OPERATION  =  X.OPERATION  and  Z.OP_TIME  =  F.OP_TIME 
where  F  is  an  occurrence  in  set  Y.WK_ST_OPS 
such  that  F. OPERATION  =  X.OPERATION  ) 
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5.7    Mofjpling-  Composite  Ohiect,  Types  or  tlie  "Ts-A-Pa.rt,-Of"  Relationship 

Engineering  databases  model  composite  objects  composed  of  a  collection  of  similar  or 
dissimilar  objects.  The  semantic  feature  captured  by  these  composite  objects  is  the  "is  a 
part  of"  relationship.  The  composite  object  has  a  structure  corresponding  to  a  set  of  sets, 
since  each  component  object  is  represented  by  a  set  of  occurrences.  Composite  objects  are 
also  used  to  represent  a  class  of  objects  or  a  sub-database. 

In  Figure  5.4,  PRODJDATA  is  a  collection  of  the  various  components  of  product  data 
and  results  of  tests  performed  on  these  products.  The  product  data  include  descriptions  and 
specifications  of  products  and  the  tests  describe  electrical,  thermal  and  mechanical  tests  con- 
ducted on  these  products.  The  various  components  of  PROD_DATA  are  PROD_SPEC, 
PROD_DESC,  ELECT-TEST-DATA,  THERMAL_TEST_DATA  and 

MECH_TEST_DATA. 

In  our  model,  a  composition  (C)  association  type  models  a  composite  object  type.  Each 
component  of  the  composite  object  type  can  be  modeled  by  any  other  association  type 
(including  the  C  association  type,  itself).  Each  component  represents  the  entire  set  of 
occurrences  of  that  component  object  type  and  the  composite  object  occurrence  is  a  set  of 
sets.   PROD_DATA  is  comprised  of  five  sets  of  data,  corresponding  to  its  five  components. 

The  "is  a  part  of"  semantics  captured  by  the  C  association  type  is  different  from  the  G 
association  type  in  that  the  components  of  the  former  are  a  part  of  the  composite  object 
type,  whereas  the  constituents  of  the  latter  are  members  of  the  generic  type.  The  C  associa- 
tion type  also  difl'ers  from  the  A  association  type  in  which  attributes  describe  the  object  type. 

In  Figure  5.4,  the  user  object  type  PROD_DATA  is  modeled  by  a  C  association  type 
and  its  component  object  types  are  modeled  by  A  association  types.  For  example,  the  com- 
ponent ELECT_TEST_DATA  is  modeled  by  an  A  association  type  whose  attributes  describe 
the  electrical  tests  performed.  ELECT_TESTJDATA  has  an  attribute  TESTJD  identifying 
the  test  and  attributes  TEST_PARAM  and  PROD^ET  describing  the  tests  performed 
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rather  than  the  products  tested.  PROD_SET  is  a  set  of  values  from  the  domain 
PRODUCT_[D  and  this  attribute  specifies  which  products  were  tested.  Similarly,  com- 
ponents THERMAL_TEST_DATA  and  MECH_TEST_DATA  are  also  modeled  by  A  associ- 
ation types  whose  attributes  describe  the  test  performed.  In  contrast,  components 
PROD^PEC  and  PROD_DESC  describe  all  the  products  in  the  knowledge  base,  identified 
by  the  value  of  attribute  PRODUCT-JD,  irrespective  of  the  tests  performed  on  them. 

Any  KML  operation  executed  against  a  composite  object  type  is  actually  executed  ;l 

against  its  components.  When  data  are  to  be  retrieved  from  PROD_DATA,  the  retrieval 
operations  actually  execute  against  its  five  components  where  each  component  is  a  set  of  ; 

occurrences.  To  support  this,  each  composite  object  type  is  defined  by  rules  which  generate 
corresponding  operations  to  be  executed  against  its  components.  -v;^' ?;  ■• 

The  user  may  wish  to  retrieve  from  PROD_DATA  all  data  relevant  to  a  particular 
product  identified  by  a  value  for  PRODUCTJD.    This  RETRIEVE  operation  actually  vt    • 

involves  five  RETRIEVE  operations  executed  against  the  five  components.  Each  component 
is  defined  by  a  value  dependent  rule  comp_rul_l,  .  .  .  ,  comp_ruL5,  respectively,  which  is 
responsible  for  the  actual  RETRIEVE  operations.   Triggers  Til,  T12,  .  .  .  ,  T15,  also  defined  ^ 

for  PROD-DATA,  execute  the  rules  comp_ruLl,  .  .  .  ,  comp_ruL5  in  response  to  a 
RETRIEVE  operation  against  PROD_DATA.  -  e;.;-'         .        ^^^--rV:? 

Note  that  a  single  trigger  could  also  have  been  specified  to  execute  these  rules.    The 
rules  are  as  follows:  •  ,■    ,       ' 

Til  :  PROD-DATA  v 

IF  par-exec(RETRIEVE  from  PROD_DATA  all  data  such  that 
EQUAL(PRODUCT_ID,  "prod_no") ) 

THEN  (EXECUTE  comp_rul_l("prod_no")  :  ELECT_TEST_DATA)  -  -/^ 

comp_rul_l("prod_no")  :  ELECT_TEST_DATA 
IF  (there  exists  an  occurrence  X  in  ELECT_TEST_DATA  such  that 
SET_MEMBER("prod_no",  X.PROD_SET)  ) 

THEN  (RETRIEVE  (X.TESTJD,  X.TEST_P ARAM)  ) 


r'-'' 
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T15  :  PROD_DATA 

IF  par-exec{RETRIEVE  from  PROD_DATA  all  data  such  that 
EQUAL(PRODUCT_[D,  "prod_no") ) 

THEN  (EXECUTE  comp_ruL5("prod_no")  :  PROD_DESC) 

comp_ruL5("prod_no")  :  PROD_DESC 
IF  (there  exists  an  occurrence  X  in  PROD_DESC  such  that 
EQUAL(X.PRODUCT_JD,  "prod_no")  ) 
THEN  (RETRIEVE  X) 

Other  useful  semantic  features  not  dealt  with  in  this  chapter  include  1)  classes  of 
objects  and  their  attributes,  2)  operations  representing  high  level  access  functions  that  can  be 
defined  for  object  types,  3)  recursive  structures  representing  objects  whose  attributes  are 
drawn  from  the  same  object  type,  etc.  These  features  can  also  be  captured  by  the  user 
object  types  and  knowledge  rules  defined  for  the  object  types. 
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Figure  5.1   Describing  an  Object  by  its  Attributes 
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Figure  5.2  A  Generic  Object  and  its  Constituents 
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Figure  5.3   Complex  Objects  and  their  Interactions 
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Figure  5.4  Composite  Objects  and  their  Components 


CHAPTER  VI 
THE  MECHANISM  OF  RULE  PROCESSING  IN  AN  INTEGRATED  KBMS 

A  mechanism  for  applying  rules  that  capture  problem  solving  knowledge  is  a  critical 
feature  in  the  design  of  our  KBMS  and  is  the  focus  of  this  chapter.  Processing  in  the 
integrated  KBMS  is  characterized  using  transactions.  A  match-modify-execute  (MME)  cycle 
represents  the  mechanism  used  to  apply  the  rules  defined  for  the  object  types  and 
occurrences  in  the  knowledge  base,  while  executing  a  KBMS  transaction. 

This  chapter  describes  various  aspects  of  the  MME  cycle  that  executes  a  KBMS  tran- 
saction. We  describe  how  value  independent  rules  that  capture  operational  semantics  are 
used  to  directly  modify  the  KBMS  transaction.  We  also  describe  two  methods  for  selecting 
value  dependent  rules  that  capture  declarative  semantics:  explicit  and  implicit  selection. 
These  value  dependent  rules  are  executed  against  the  knowledge  base.  Examples  of  user 
object  types  and  their  associated  rules,  from  Chapter  Five,  are  used  to  illustrate  this 
mechanism.  A  prototype  of  the  MME  cycle  was  developed  using  the  0PS5  production  sys- 
tem language  and  is  also  described.  It  is  this  prototype  that  laid  the  groundwork  for 
developing  implementation  techniques  that  foster  the  functional  integration  of  the  DBMS 
and  the  AI  reasoning  components,  within  the  KBMS. 

A  transaction  is  defined  as  a  unit  of  work;  it  is  also  a  unit  of  recovery  in  that  the  data- 
base must  be  in  a  consistent  state  both  before  and  after  the  execution  of  the  transaction 
[DAT77].  A  typical  transaction  consists  of  a  sequence  (or  a  tree)  of  KML  operations  to  be 
executed  against  the  knowledge  base.  These  operations  could  be  either  retrieval  operations 
or  storage  manipulation  operations.  A  transaction  can  be  thought  of  as  being  equivalent  to  a 
sequence  (or  a  tree)  of  goals.    A  RETRIEVAL  operation  in  a  database  transaction  resembles 
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a  Prolog  like  goal  and  provides  a  declarative  description  of  the  required  information.  A 
storage  manipulation  operation  such  as  an  UPDATE  or  DELETE  does  not  directly 
correspond  to  a  Prolog  like  goal.  Processing  in  a  forward  chaining  production  system  (PS) 
[NEW73]  such  as  OPS5  is  not  goal  oriented  and  there  is  no  simple  analog  to  the  concept  of  a 
KBMS  transaction. 

In  contrast  to  transaction  processing  in  a  conventional  DBMS,  where  the  transaction  is 
executed  and  then  either  committed  or  aborted,  in  the  KBMS,  the  execution  of  a  transaction 
is  controlled  by  a  match-modify-execute  (MME)  cycle.  This  cycle  allows  rules  to  be  incor- 
porated into  the  transaction  and  supports  rule  processing  in  the  KBMS.  The  transaction  is 
matched  against  rules  which  modify  the  transaction  using  rules  prior  to  the  execution  of  the 
modified  transaction.  After  the  modified  transaction  is  executed  it  will  be  committed. 
Finally,  the  MME  cycle  selects  and  executes  rules  that  have  been  made  applicable  as  a  conse- 
quence of  committing  the  (changes  to  the  knowledge  base  made  by  the)  modified  transaction. 

During  the  match  phase,  the  operations  of  the  initial  transaction  are  matched  against  a 
subset  of  rules  defined  for  the  object  types,  against  which  the  operations  are  to  be  executed. 
This  subset  of  rules  are  those  value  independent  rules,  introduced  in  Chapter  Four,  which 
test  operations  on  their  LHS.  These  rules  were  identified  as  triggers  and  the  matching  opera- 
tions (in  the  transaction)  as  triggering  operations.  The  triggers  are  value  independent  since 
their  execution  depends  on  the  triggering  operations  rather  than  actual  values  of  the 
knowledge  base  object  occurrences.  They  capture  operational  semantics  and  directly  modify 
the  transaction  without  access  to  the  object  occurrences. 

The  modifications  occur  in  the  modify  phase  of  the  MME  cycle  and  they  depend  on 
both  the  RHS  consequent  and  the  LHS  options  specified  in  the  triggers.  The  modifications 
could  incorporate  either  operations  or  value  dependent  rules  to  the  transaction,  as  will  be 
discussed  later.  This  means  that  the  appended  (new)  parts  of  the  transaction  must  be 
matched  further  with  appropriate  value  independent  rules;  this  process  continues  until  no 
further  modifications  are  possible. 
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The  modified  transaction  which  now  consists  of  I<ML  operations  and  value  dependent 
rules  is  then  executed.    The  IvML  operations  are  simply  executed.    For  rules,  the  LHS  is 

.;  {  tested  and  if  it  is  satisfied,  then  the  RHS  is  executed.    Testing  the  LHS  of  the  rule  can 

modify  the  transaction;  i.e.  it  introduces  retrieval  operations  which  must  also  go  through  the 
match-modify  phases  before  execution.    Similarly,  the  RHS  consequent  conditionally  intro- 

...  '  duces  operations  or  value  dependent  rules  which  may  require  further  matching,  etc.  .     : Vr 

,  The  value  independent  rules  which  test  the  execution  status  of  either  KML  operations 

1-' ,- .  •   ■ 

':j_'"  ;..  or  other  value  dependent  rules  modify  the  transaction.    In  contrast,  the  value  dependent 

rules  which  test  the  values  of  knowledge  base  object  occurrences,  flags,  messages,  etc.,  are 
f  . . >',  directly  executed  against  the  knowledge  base.   These  rules  are  selected  for  execution  by  two  • 

different  methods.  .    ■'." 

In  the  first  method,  value  dependent  rules  are  explicitly  selected  for  execution  by  other  ^ 

rules.    For  example,  a  value  independent  rule  which  matched  against  the  transaction  will  be  ■  /.  ,. 

part  of  an  explicit  inference  chain  and  will  explicitly  select  a  value  dependent  rule  for  execu-  "' 

tion,  using  the  EXECUTE  construct  on  its  RHS.    This  rule  will  be  incorporated  into  the 
'  KBMS  transaction.    In  addition,  a  value  dependent  rule  already  in  the  transaction  can,  in 

turn,  explicitly  select  another  value  dependent  rule.    This  technique  for  selecting  value 
dependent  rules  is  called  explicit  selection  and  the  selected  rules  are  incorporated  into  the 
'  KBMS  transaction,  for  execution. 

'^■.-  '■■■,  In  the  second  method,  some  value  dependent  rules  are  selected  during  the  execute  phase 

of  the  MME  cycle.    The  execution  of  the  operations  of  the  modified  KBMS  transaction  may 
■^  ,y .    -  .        place   the    knowledge   base    in    a  state   where   certain   conditions   are   satisfied   by   object 
occurrences.    Suppose  these  are  the  same  conditions  that  are  specified  on  the  LHS  of  some 
value  dependent  rules.   Then,  after  the  modified  KBMS  transaction  is  committed,  these  rules  ;  - 

n   ■  must  be  selected  for  execution.    The  selection  of  these  rules,  in  the  execute  phase  of  the 

'    '  MME  cycle,  is  not  explicitly  specified  by  some  value  independent  rule,  and  is  a  value  depen- 

dent process.   The  operations  executed  by  the  modified  KBMS  transaction  are  used  as  a  seed 
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or  starter  to  select  value  dependent  rules  that  are  defined  for  the  object  types  aff"ected  by 
these  operations.   This  technique  is  called  implicit  selection. 

We  have  several  reasons  for  supporting  these  two  techniques  for  selecting  value  depen- 
dent rules,  in  the  KBMS.  As  discussed  in  Chapter  Four,  the  EXECUTE  construct  is  useful 
when  there  is  some  a  priori  knowledge  about  explicit  inference  chains  between  rules.  These 
chains  help  in  grouping  related  rules  and  in  passing  variables  into  rules  and  can  be  an  aid  to 
an  efficient  implementation.  However,  there  is  a  drawback  in  that  control  information  is 
embedded  in  the  rules  and  the  rules  are  not  independent  of  each  other.  There  is  overhead 
mvolved  in  ensuring  that  there  are  no  dangling  references;  i.e.  the  execution  of  value  depen- 
dent rules  that  do  not  exist.  There  is  also  overhead  in  ensuring  that  all  value  dependent  rules 
occur  in  at  least  one  inference  chain  (so  as  to  be  useful). 

In  some  situations  we  will  not  know  a  priori  which  object  type  a  value  dependent  rule 
should  be  associated  with  or  what  operation  should  trigger  its  execution.  Thus,  explicit 
selection  fails  when  we  do  not  have  a  priori  information  about  inference  chains.  We  may 
wish  to  incrementally  add  rules  to  the  knowledge  base,  perhaps  on  a  trial  basis.  We  may 
also  wish  to  specify  value  dependent  rules  that  could  be  triggered  by  several  operations. 
Under  these  conditions,  the  implicit  selection  technique  provides  adequate  support  for  rule 
processing  in  the  KBMS,  since  it  selects  value  dependent  rules  whenever  their  LHS  condi- 
tions are  satisfied. 

To  make  these  two  selection  techniques  mutually  exclusive  and  prevent  overlap;  i.e., 
selecting  the  same  rule  twice,  both  explicitly  and  implicitly,  we  place  restrictions  on  the 
selection.  Only  value  dependent  rules  that  are  not  selected  in  any  explicit  inference  chains 
will  be  candidates  for  implicit  selection. 

Before  we  provide  a  detailed  description  of  the  MME  cycle,  we  make  a  few  comments 
about  this  approach.  The  mechanism  we  have  briefly  described  for  applying  rules  has  the 
characteristics  of  both  the  compiled  approach  as  well  as  the  interpreted  approach.  In  the 
presence  of  rules  (the  intensional  database),  a  transaction  or  a  query  is  compiled  if  it  passes 
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through  two  distinct  phases  [REI78a].  The  first  is  a  compilation  phase  where  the  query  is 
processed  against  the  intensional  database  alone,  to  produce  some  form  of  object  code.  The 
second  phase  is  an  execution  phase  where  the  compiled  object  code  is  executed  against  the 
extensional  database  alone  without  accessing  the  intensional  database.  If  these  two  phases 
are  not  distinct  then  the  approach  is  defined  to  be  interpretive  [REI78a]. 

Modifying  the  transaction  using  the  value  independent  rules  and  the  explicit  selection  of 
value  dependent  rules  is  comparable  to  compiling  the  transaction,  since  this  process  only 
accesses  the  rules  defined  for  the  object  types.  The  modified  transaction  can  be  considered 
some  form  of  "object"  code  to  be  executed  against  the  object  occurrences  of  the  knowledge 
base. 

However,  the  execution  of  the  modified  transaction  is  not  independent  of  rules.  For 
example,  executing  a  value  dependent  rule  in  the  transaction  can  further  modify  the  transac- 
tion by  appending  operations  or  value  dependent  rules.  These  operations  and  rules  must  also 
be  matched  and  this  requires  access  to  rules  defined  for  the  object  types.  Thus,  the  MME 
cycle  is  no  longer  a  strictly  compiled  approach.  The  implicit  selection  of  value  dependent 
rules  during  the  execute  phase  of  the  MME  cycle  also  makes  this  interpretive  rather  than 
compiled;  both  object  occurrences  and  rules  are  involved. 

One  of  the  advantages  of  the  MME  cycle  is  that  it  is  a  simple  mechanism  which  does  , 
not  require  the  services  of  sophisticated  pieces  of  software  such  as  a  theorem  prover,  etc. 
This  is  partly  because  operational  semantics  that  specify  triggering  and  scheduling  informa- 
tion  are  explicitly  captured  in  the  triggers.    The  only  support  the  KBMS  need  provide  is  to 
match  the  triggering  operations  and  rules  in  the  transaction  against  the  LHS  of  the  triggers  '??! 

Another  reason  for  the  simplicity  of  the  MME  cycle  is  that  KML  constructs  are  used  to  "  "'' 

specify  value  dependent  rules  as  well  as  operations  in  the  KBMS  transaction;  thus,  value 
dependent  rules  can  be  directly  incorporated  into  the  transaction  when  they  are  explicitly 
selected.  The  MME  cycle  must  only  be  able  to  modify  the  representation  of  the  transaction 
in  order  to  support  this.    This  requires  the  ability  to  append  operations  or  value  dependent 
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rules,  either  during  the  modify  phase  or  during  the  execute  phase.  If  this  occurs  during  the 
execute  phase,  then  the  MME  cycle  must  be  re-invoked  so  that  the  new  operations  and  rules 
can  also  be  matched  against  the  triggers. 

The  MME  cycle  must  be  able  to  halt  the  execution  of  an  operation  or  rule  if  a 
corresponding  abort  flag  is  set.  Finally,  the  implicitly  selected  value  dependent  rules  are 
each  treated  as  independent  transactions  to  be  executed.  These  aspects  will  be  discussed  in 
this  chapter  which  describes  the  design  of  the  MME  cycle  and  in  the  next  chapter  which 
deals  with  KBMS  implementation  issues. 

It  is  clear  that  processing  a  KBMS  transaction  is  complex  when  compared  to  a  DBMS 
transaction.  The  MME  cycle  has  some  features  of  an  AI  rule  processing  system  as  well  as  a 
DBMS  and  requires  greater  system  support.  Another  important  consideration  is  that  its 
design  must  support  functional  integration  of  the  DBMS  and  the  AI  rule  processing  com- 
ponents. Functional  integration  allows  migration  of  techniques  from  either  DBMS  or  AI 
technology  and  their  use  in  the  MME  cycle. 

Consider  conventional  DBMS  query  optimization  techniques.  It  would  be  advantageous 
for  the  MME  cycle  to  exploit  these  techniques.  Conventional  optimization  techniques  assume 
that  the  DBMS  transaction  is  not  modified  during  execution  and  this  simplifies  the  optimizer. 
However,  we  have  seen  that  the  MME  cycle  is  interpretive;  this  allows  a  KBMS  transaction 
to  get  modified  during  the  execute  phase  too. 

This  dilemma  would  seemingly  prevent  the  use  of  optimization  techniques  in  the  MME 
cycle.  However,  one  way  to  resolve  this  conflict  is  to  identify  those  parts  of  a  KBMS  tran- 
saction which  have  already  been  modified  and  will  not  be  further  modified  during  execution. 
We  can  now  apply  conventional  query  optimization  to  those  parts  alone,  wherever  they  are 
identified  in  the  KBMS  transaction.  Issues  such  as  this  that  are  relevant  to  functional 
integration  are  dealt  with  in  subsequent  chapters. 

Before  we  examine  the  details  of  the  MME  cycle  we  compare  the  inference  mechanisms 
involved  in  these  different  systems  we  have  discussed.    Information  propagates  down  an 
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inference  chain  by  the  process  of  variable  binding  and  the  direction  of  variable  binding  within 
1^  a  rule  will  determine  the  direction  of  inference.    The  inference  mechanism  in  a  Prolog  like 

; ,  ^.  system  is  a  backward  chain;  i.e.,  the  system  will  work  backwards  from  the  goal  and  use  the 

goal  to  bind  variables  in  the  consequent  of  those  clauses  used  to  satisfy  the  goal. 

In  contrast,  in  the  absence  of  a  specific  goal,  a  forward  chaining  PS  such  as  OPS5  is  in 

a  continuous  cycle   of  match,   selection   and   action.     Conceptually,   in  each  cycle,   OPS5 

matches  all  the  working  memory  elements  against  the  condition  elements  on  the  LHS  of  all 
'  the  rules  to  determine  a  conflict  set  of  all  applicable  rules.   Then  some  subset  of  this  conflict 

v'_  set  is  selected  based  on  a  conflict  resolution  strategy.    In  the  action  phase  the  working 

memory  elements  are  changed  as  specified  in  the  RHS  of  the  selected  rules.  This  is  an  exam- 
'■':  ;    ;'.  pie  of  a  forward  chain  of  inference  since  the  variables  of  the  condition  elements  on  the  LHS 

■'  of  the  rules  are  initially  bound. 

'  --v.'  ■ 

■  In  our  transaction  oriented  MME  cycle,  rules  are  applied  using  both  forward  and  back- 

ward chains  of  inference.  When  a  triggering  operation  or  rule  matches  against  the  LHS  of  a 
value  independent  trigger,  T,  then  variables  in  the  LHS  of  T  will  be  bound.  This  is  a  for- 
ward chain  of  inference.   Variable  binding  within  T  will  be  from  the  conditional  LHS  to  the 

■  '  "  consequent  RHS. 

;  "   '■  In  an  explicit  inference  chain,  rule  rl  uses  the  EXECUTE  construct  to  explicitly  select 

■'  ?   'V  a  value  dependent  rule,  r2,  and  rl  will  propagate  information  into  r2,  via  parameters. 

Depending  on  the   parameters  passed  between   rl   and   r2,   both  forward   and   backward 

[•;*•■  inferencing  is  supported.   For  forward  chaining,  rl  will  initially  bind  variables  on  the  condi- 

tional LHS  of  r2.  For  backward  chaining,  rl  will  initially  bind  variables  on  the  consequent 
:.  RHS  of  r2,  as  was  seen  in  the  transitive  closure  example  of  Chapter  Five.   The  implicit  selec- 

tion of  value  dependent  rules  during  the  execute  phase  of  the  MME  cycle  is  a  forward  chain; 
object  occurrences  from  the  knowledge  base  are  used  to  initially  bind  variables  on  the  condi- 
tional LHS  of  these  rules  in  order  to  determine  which  of  these  rules  may  be  applicable. 
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We  now  examine  the  details  of  the  match,  modify  and  execute  phases  of  the  MME 
cycle.  During  the  match  phase,  the  operations  of  the  initial  transaction  are  matched  against 
the  appropriate  subset  of  value  independent  triggers.  In  the  next  modify  phase,  all  matching 
triggers  modify  the  transactions  based  on  their  RHS  consequents  and  the  options  specified  on 
their  LHS.  As  seen  in  Figure  6.1,  there  are  three  ways  in  which  a  triggering  operation  can 
be  modified  by  a  trigger.  When  the  trigger  uses  the  pre-exec  option,  on  the  LHS,  then  the 
RHS  consequent  of  the  trigger  is  scheduled  to  execute  before  the  triggering  operation.  If  the 
post-exec  option  is  used,  then  the  RHS  consequent  is  scheduled  to  follow  the  operation.  If 
the  par-exec  option  is  specified,  then  the  RHS  consequent  is  scheduled  for  parallel  execution. 

The  RHS  consequent  of  the  trigger  can  either  execute  an  operation  or  a  value  depen- 
dent rule  (specified  using  the  EXECUTE  construct).  This  operation  or  rule  is  directly  incor- 
porated into  the  transaction.  The  modification  could  also  make  the  execution  of  the  trigger- 
ing operation  conditional  on  the  outcome  of  the  RHS  consequent  of  the  trigger,  as  seen  in  the 
figure.  This  must  be  noted  and  the  operation  marked  so  that  conditional  execution  can  be 
supported  later,  during  the  execute  phase.  The  modified  transaction  repeatedly  goes  through 
the  match  and  modify  phases  so  that  the  rules  and  operations  incorporated  in  the  previous 
modify  phase  can  be  matched.  A  "match  level"  is  used  as  an  indicator  that  the  match  phase 
has  been  re-invoked  and  the  match  level  is  incremented  whenever  newly  incorporated  opera- 
tions and  rules  are  matched/modified. 

Value  dependent  rules  that  get  incorporated  into  the  transaction  are  modified  as  seen  in 
Figure  6.2  where  a  value  dependent  rule,  rulel,  matches  with  a  value  independent  trigger.  If 
the  pre-exec,  post-exec  or  par-exec  options  are  specified,  then  the  modification  is  similar  to 
Figure  6.1.  If  the  succ-exec  options  is  specified,  then  the  RHS  consequent  of  the  trigger  is 
scheduled  to  follow  rulel,  and  it  will  be  conditionally  executed  depending  on  the  execution 
status  of  rulel. 
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Value  dependent  rules  that  are  incorporated  into  the  transaction  could  also  be  values  of 
attributes  of  abstract  data  type  RULE,  specified  for  a  particular  object  occurrence.  Since 
these  rules  are  actually  the  values  of  attributes,  they  are  not  associated  with  an  execution 
status.   Thus,  these  rules  are  not  matched  against  the  value  independent  triggers. 

We  must  guarantee  that  the  match  and  modify  phases  of  the  MME  cycle  eventually 
terminates.  If  the  transaction  is  finite;  i.e.,  it  comprises  a  finite  number  of  operations  against 
a  finite  number  of  object  types,  and  if  there  are  a  finite  number  of  value  independent  triggers 
defined  for  these  object  types,  then  the  match  and  modify  phases  will  eventually  terminate, 
in  the  absence  of  cycles.  K  there  are  cycles,  as  in  the  following  example: 

transaction  fragment:    INSERT  into  obj_l 

rules:  IF  option(INSERT  into  obj_l)  THEN  (INSERT  into  obj_2) 

IF  option(INSERT  into  obj_2)  THEN  (INSERT  into  obj_l) 
then,  as  soon  as  a  cycle  is  detected,  the  match  and  modify  phases  corresponding  to  the  cycle 
is  terminated.  Cycles  which  have  no  termination  condition  that  depends  on  values  in  the 
knowledge  base  could  mean  endless  execution.  To  avoid  this,  either  all  cycles  must  be  elim- 
inated or  each  cycle  must  be  checked  to  make  sure  it  can  be  broken.  The  latter  is  more 
expensive  since  one  must  ensure  that  at  least  one  operation  in  the  cycle  is  conditionally  exe- 
cuted; i.e.,  it  must  match  with  a  trigger  whose  LHS  uses  the  pre-exec  option  and  whose  RHS 
consequent  can  halt  its  execution.  Alternately,  one  must  ensure  that  there  is  at  least  one 
value  dependent  rule  included  in  the  cycle. 

As  seen  in  Figures  6.1  and  6.2,  when  the  pre-exec  option  is  used  by  the  triggers,  trigger- 
ing operations  and  rules  are  conditionally  executed,  depending  on  the  outcome  of  the  RHS 
consequent  of  each  trigger.  Conversely,  with  the  post-exec  or  succ-exec  options,  the  RHS  con- 
sequent of  the  triggers  are  conditionally  executed,  depending  on  the  execution  status  after 
executing  the  triggering  operations  or  rules.  Thus,  during  the  match-modify  phases,  the  sys- 
tem has  to  store  some  added  information  linking  the  RHS  consequent  of  those  triggers  and 
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the  corresponding  triggering  operations  or  rules.  This  could  be  in  the  form  of  abort  flags 
that  halt  the  execution  of  operations  or  rules.  The  system  will  also  be  responsible  for  passing 
values  of  parameters  between  the  triggering  operations  or  rules  and  the  triggers. 

A  triggering  operation  (or  rule)  could  simultaneously  match  with  several  triggers. 
When  this  occurs  there  is  a  resulting  opportunity  for  parallelism  within  a  single  transaction, 
as  will  be  seen  in  the  examples  of  the  next  section.  Parallelism  is  an  important  issue  which 
will  be  re-visited  as  we  consider  implementation  techniques  for  the  MME  cycle  in  an 
integrated  KBMS. 

The  match-modify  phases  just  described  corresponds  to  the  compiled  mode  of  the  MME 
cycle.  Following  the  termination  of  the  match-modify  phases,  the  modified  transaction  is 
executed;  this  is  the  execute  phase  of  the  MME  cycle. 

KML  operations  in  the  transaction  are  directly  executed  as  in  a  conventional  DBMS 
with  two  exceptions.  The  first  exception  occurs  when  an  operation  is  conditionally  executed 
as  discussed.  These  operations  are  previously  marked  during  the  match-modify  phases.  Usu- 
ally the  RHS  consequent  of  the  corresponding  trigger  executes  a  value  dependent  rule.  This 
rule  will  precede  the  execution  of  the  operation  and  will  set  an  abort  flag  if  the  triggering 
operation  must  be  halted.  Thus,  the  system  must  check  if  any  abort  flags  are  set  before  exe- 
cuting marked  KML  operations. 

The  second  exception  occurs  when  the  triggering  operation  and  the  RHS  consequent  of 
the  trigger  are  scheduled  to  execute  in  parallel.  The  triggering  operation  and  the  RHS  conse- 
quent usually  interact  with  each  other  in  this  situation.  For  example,  the  RHS  consequent 
may  execute  a  value  dependent  rule  that  generates  information  for  the  triggering  operation 
to  process.  Thus,  while  the  RHS  consequent  is  being  executed  (and  is  generating  informa- 
tion), the  triggering  operation  should  not  complete  its  execution. 

The  execution  of  a  value  dependent  rule  is  more  complicated.  Figure  6.2  showed  the 
case  of  rules  that  are  conditionally  executed,  associated  with  the  pre-exec  option.  These 
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rules  are  marked  during  the  modify  phase.   The  corresponding  abort  flags  are  checked  before 
executing  the  rule.  "^  : 

Rule  execution  comprises  two  tasks;  first,  the  LHS  of  the  rule  is  verified  before  the  RHS 
consequent  is  executed.  Verifying  the  LHS  of  a  value  dependent  rule  requires  identifying 
object  occurrences  that  satisfy  some  condition  and  this  introduces  retrievals;  i.e.,  the  LHS  of 
the  rule  is  replaced  by  appropriate  retrieval  operations.  Before  these  retrieval  operations  are 
executed,  they,  too,  go  through  the  match-modify  process.  If  there  are  any  triggers  that 
match  with  these  retrieval  operations,  then  the  transaction  is  further  modified  before  the 
retrieval  operations  are  executed.  The  match  between  the  retrieval  operations  and  the 
triggers  may  also  result  in  cycles,  as  in  the  example  of  transitive  closure  to  be  discussed  in 
detail  in  the  next  section  and  in  Chapter  Eight.  However,  the  cycles  in  this  example  have  a  ,.  . 
termination  condition  since  there  are  value  dependent  rules  included  in  the  cycles. 

If  the  LHS  of  a  value  dependent  rule  is  satisfied,  then  the  RHS  consequent  is  executed. 
The  RHS  is  itself  composed  of  further  operations  (or  rules)  and  they,  too,  must  go  through 
the  match-modify  phases  before  execution.  Executing  the  RHS  of  a  value  dependent  rule  is 
equivalent  to  manipulating  the  object  occurrences  of  the  knowledge  base,  which  is  a  function  ■•li 

of  a  conventional  DBMS.  ^;--^ 


li'i 


Figure  6.2  also  shows  the  case  where  the  succ-exec  option  is  used  with  a  triggering  rule, 
in  a  transaction.  These  triggering  rules  are  marked  during  the  modify  phase  and  the  system 
determines  that  the  rule  successfully  executed;  i.e.,  the  LHS  of  the  rule  was  satisfied  and  the 

RHS  consequent  indeed  executed  before  proceeding  --' 

We  have  seen  that  during  the  execution  of  a  value  dependent  rule,  new  operations  or  .  , N^J 

rules  can  be  introduced;  i.e.,  the  transaction  can  be  further  modified.    Thus,  in  addition  to  ;;       -J 

appending  these  operations  or  rules,  the  MME  cycle  must  be  re-invoked  so  that  the  new  M 

operations  and  rules  can  be  further  matched/modified  before  execution.    To  identify  that        v  t 

these  operations  or  rules  have  been  incorporated  during  execution  and  that  the  MME  cvcle  'v-l 

must  be  re-invoked,  we  make  use  of  an  "execution  level."  '-d 
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To  explain  the  use  of  the  execution  level,  the  initial  transaction  has  an  execution  level 
of  1.  The  operations  and  rules  appended  during  the  subsequent  modify  phases  are  also  at 
execution  level  =  1.  After  the  match-modify  phase  terminates,  the  modified  transaction  at 
execution  level  =  1  is  executed.  However,  operations  or  rules  that  are  appended  during  the 
execute  phase  are  marked  at  a  higher  execution  level.  For  example,  during  execution  of  a 
value  dependent  rule  at  execution  level,  k,  new  rules  or  operations  which  are  appended  by 
evaluating  this  rule  are  marked  at  execution  level  (k+1). 

Higher  values  for  execution  level  take  precedence,  so,  the  most  recently  appended 
operations  and  rules  are  executed  first.  Since  these  new  operations  and  rules  need  to  be 
matched,  we  suspend  execution  of  the  transaction;  i.e.,  we  suspend  the  execute  phase  of  the 
MME  cycle  at  the  current  execution  level,  k.  Next,  we  re-invoke  the  MME  cycle,  starting 
with  the  match  phase,  with  execution  level  set  to  (k+1). 

When  the  execute  phase  of  the  MME  cycle  at  execution  level  (k+1)  concludes;  i.e.,  the 
transaction  at  level  (k+1)  completes  execution,  then  we  resume  the  suspended  execute  phase 
of  the  MME  cycle  at  execution  level  k;  i.e.,  we  resume  execution  of  the  transaction  at  level  k. 

Suspending  the  execute  phase  of  the  MME  cycle  at  level  k  and  re-invoking  the  MME 
cycle  at  level  (k+1)  to  match  new  operations  or  rules  against  triggers  no  longer  fits  the  com- 
piled approach.  It  requires  access  to  the  rules  defined  for  object  types,  during  execution 
against  object  occurrences,  and  it  can  be  compared  to  switching  between  a  compiler  and  an 
interpreter. 

We  use  the  same  KML  constructs  to  express  the  rules  and  to  specify  operations  in  the 
transaction  and  the  match-modify  phases  (compilation  phases)  of  the  MME  cycle  do  not  use 
sophisticated  theorem  provers,  etc.  Consequently,  the  cost  of  switching  between  the  com- 
piled and  interpreted  modes  is  not  so  excessive  as  to  make  the  approach  we  have  taken 
infeasible.  However,  there  is  a  certain  overhead  involved  in  switching;  i.e.,  suspending  the 
execute  phase,  re-invoking  it,  etc.,  and  it  is  beneficial  to  attempt  to  reduce  the  number  of 
times  this  switching  must  take  place. 
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One  way  to  reduce  the  number  of  times  switching  takes  place  is  to  pass  the  transaction 
^  through  a  "look-ahead"  process,  before  execution.    The  look-ahead  process  will  attempt  to 

identify  likely  candidates  to  be  further  compiled;  i.e.,  passed  through  the  match  and  modify 
phases.   For  example,  executing  the  LHS  of  a  value  dependent  rule  at  execution  level  k  intro- 
,  ;•  duces  retrieval  operations  at  level  (k+1).    These  retrieval  operations  may  match  with  value 

independent  rules,  thus,  re-invoking  the  MME  cycle  at  level  (k+1).  Similarly,  executing  the 
RHS  of  value  dependent  rules  also  introduces  operations  or  rules.  This  happens  less  fre- 
quently since  the  RHS  of  a  value  dependent  rule  is  conditionally  executed  whereas  the  LHS 
of  the  rule  is  almost  always  executed,  unless  an  abort  flag  is  set  for  the  rule.  Thus,  the  LHS 
retrieval  operations  are  better  candidates  for  the  look-ahead  process. 

The  look-ahead  process  looks  ahead  and  compiles  those  retrieval  operations  that  will 

;  v  actually  be  appended  later  during  execution  of  the  LHS  of  the  value  dependent  rules.    The 

<;    :;  look-ahead  process  will  effectively  pass  these  operations  through  the  match-modify  phases 

before   they   are   actually   appended.     Since   the   system   is  still   in   the   compilation   mode 

•  ■''';-  corresponding  to  the  current  execution  level,  the  look-ahead  process  reduces  the  number  of 

■■,  ;  switches  between  the  compiled  and  interpreted  modes. 

Thus,  before  execution  at  level  k;  i.e.,  before  the  execute  phase  at  level  k,  the  LHS  of 
the  value  dependent  rules  at  level  k  are  further  processed.     The  corresponding  retrieval 
5     '  operations  they  will  introduce  at  level  (k+1)  are  passed  through  the  look-ahead  process  which 

corresponds  to  the  match-modify  phases  of  the  MME  cycle  with  execution  level  set  to  (k+1). 
^•'-*  Now,  during  execution  at  level  k,  when  the  retrieval  operations  at  level  (k+1)  are  intro- 

'k  '■'  '1  duced,  they  will  already  have  passed  through  the  match-modify  phases.   This  results  in  con- 

siderable savings  as  the  system  does  not  have  to  suspend  the  execute  phase  with  execution 
;     s  level  k,  re-invoke  the  MME  cycle  with  execution  level  (k+1),  etc. 

;:,.  Cycles  which  cross  execution  levels  are  identified  during  this  look-ahead  compilation 

process.  If  these  cycles  are  value  dependent  (they  include  value  dependent  rules)  then  they 
are  not  eliminated.    We  note  that  if  there  are  any  value  dependent  rules  in  the  transaction 
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that  are  attributes  of  data  type  RULE,  they  are  not  passed  through  the  look-ahead  process. 
The  next  section  has  examples  describing  the  behavior  of  the  MME  cycle,  in  the  different 
situations  just  described. 

The  MME   cycle   for   any   execution   level   k,   where   k    >    1,   completes  when   the 

corresponding  execute  phase  completes;  i.e.,  when  all  the  operations  and  rules  in  the  modified 

t', 

r*  transaction  with  execution  level  k  are  executed.   The  modified  KBMS  transaction  itself  com- 

pletes execution  when  all  operations  and  rules  with  execution  level  =  1  are  executed.    Now 

;i  '.  the  modified  KBMS  transaction  will  either  be  committed  or  aborted. 

Although  the  modified  transaction  is  committed,  the  MME  cycle  (execution  level  =  1)  is 
still  not  complete;  value  dependent  rules  must  be  implicitly  selected  for  execution.  All  value 
dependent  rules,  defined  for  user  object  types,  which  are  not  explicitly  selected  for  execution 
in  at  least  one  explicit  inference  chain  are  candidates  for  implicit  selection.  Value  dependent 
rules  defined  for  particular  object  occurrences  as  attributes  of  abstract  data  type  RULE  are 
not  implicitly  selected  since  the  cost  of  identifying  if  each  occurs  in  explicit  inference  chains 

/  is  prohibitive. 

The  operations  executed  by  the  modified  transaction  are  used  as  a  seed  or  starter  to 
select  among  rules  that  are  candidates  for  implicit  selection.   This  selection  strategy  is  simi- 

;  lar  to  that  used  in  the  TREAT  algorithm  [MIR86]  to  select  applicable  production  rules  and 

■''  •  will  be  discussed  in  Chapter  Seven,  with  other  implementation  issues.   Once  value  dependent 

%,-'\  rules  are  implicitly  selected,  they  will  be  executed  much  as  the  explicitly  selected  rules. 

;  •  Compared  to  explicit  selection,  the  implicit  selection  process  is  expensive  since  it  selects 

all  the  value  dependent  rules  that  may  apply.  Of  these  rules,  there  may  be  several  whose 
LHS  is  not  satisfied,  and  the  rules  will  not  be  executed  and  this  is  an  overhead  expense. 
Implicit  selection  of  rules  takes  place  in  the  MME  execute  phase  (execution  level  =  1)  but  the 
selected  rules  must  themselves  go  through  the  MME  cycle.  In  other  words,  implicit  selection 
requires  access  to  the  value  independent  triggers  defined  for  object  types.  The  selection  pro- 
cess  no   longer   fits   the   definition   of   the   compiled    approach   and   switches   between   an 
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interpreter  and  a  compiler.  Implicit  selection  could  take  place  continually  during  the  execute 
phase  of  the  MME  cycle,  for  different  execution  levels.  However,  this  would  require  suspend- 
ing the  current  execute  phase,  re-invoking  the  MME  cycle,  resuming  the  suspended  execute 

'  phase,  etc.  The  more  frequently  this  occurs,  the  greater  the  overhead.  To  reduce  this  cost, 
we  should  reduce  the  frequency  of  this  process. 

Another  consideration  is  that  we  must  maintain  the  transaction  oriented  nature  of  the 
MME  cycle.  Implicit  selection  occurs  in  response  to  the  changes  made  to  the  knowledge  base 
object  occurrences  by  the  operations  executed  in  the  modified  transaction.  Since  the 
modified  transaction  commits  all  its  operations  collectively,  implicit  selection  should  also 
respond  to  these  changes  collectively.  ; 

Thus,  implicit  selection  is  deferred  until  after  the  modified  transaction  completes  execu- 
tion and  it  is  either  committed  or  aborted.  If  it  is  committed,  then  we  respond  collectively  to 
all  the  changes  made  to  the  knowledge  base  and  implicitly  select  and  execute  relevant  rules. 
All  the  operations  (at  all  execution  levels)  are  used  collectively  as  a  starter.  Now,  even  if  a 
rule  is  implicitly  selected  by  more  than  one  operation,  it  will  be  executed  just  once. 

.,.,'»  If  possible,  all  implicitly  selected  rules  could  be  executed  in  parallel,  as  is  discussed  in 
the  next  chapter.  Since  the  operations  of  each  KBMS  transaction  are  used  collectively,  the 
frequency  of  the  implicit  selection  process  and  the  corresponding  number  of  switches  is  also 
reduced.  , 

fi.2   Example  Transaction  Fragments  in  the  MATE  Cycle 

We  examine  a  few  examples  of  transaction  fragments  as  they  pass  through  the  MME 
cycle  to  illustrate  the  operation  of  the  different  phases.  The  examples  describe  transaction 
modification,  parallelism  of  rules  and  operations  within  a  transaction,  re-invocation  of  the 
MME  cycle  and  cycles  of  value  dependent  rules.  The  examples  are  based  on  the  rules  defined 
for  the  objects  described  in  Chapter  Five. 
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Consider  a  transaction  fragment  with  an  INSERT  operation  against  the  object  type 
TOP_SECRET_PROJ.  The  rules  relevant  to  this  object  type  are  in  Figure  6.3.  We  now 
follow  this  fragment  through  the  different  phases  of  the  MME  cycle  as  described  in  Figure 
6.4.  The  operation  matches,  simultaneously,  with  several  value  independent  triggers  T5,  T6, 
T7  and  T9.  The  X  occurrence  on  the  LHS  of  the  triggers  are  bound  by  the  inserted 
occurrence.  In  the  next  modify  phase,  the  triggers  modify  the  transaction,  based  on  the  LHS 
options  and  the  RHS  consequents. 

All  four  of  these  triggers  occur  in  explicit  inference  chains  and  they  explicitly  select 
value  dependent  rules  and  append  them  to  the  transaction.  Thus,  rule  T5,  uses  the  pre-exec 
option  to  schedule  a  value  dependent  rule  gen^constr_l  to  precede  the  INSERT  operation. 
Rules  T6  and  T7  both  use  the  par-exec  option  and  value  dependent  rules  gen^constr_2  and 
gen_hier_l  execute  in  parallel  with  the  operation.  Finally,  rule  T9  uses  the  post-exec  option 
and  as  a  result,  rule  loc_stat_l  succeeds  the  INSERT  operation.  These  explicit  inference 
chains  are  all  forward  chains;  triggers  T5,  T6,  T7  and  T9  initially  bind  variables  on  the  LHS 
of  rules  gen^constr-l,  gen_xonstr_2,  etc.  All  operations  and  rules  of  the  modified  transaction 
have  their  execution  level  set  to  1. 

The  INSERT  operation  is  to  be  conditionally  executed  and  it  is  marked  to  indicate  that 
it  has  triggered  a  rule  gen_jconstr_l,  whose  RHS  consequent  could  halt  its  execution.  Refer 
to  Figure  6.4  for  a  stepwise  representation  of  a  fragment  of  a  KBMS  transaction  as  it  passes 
through  the  match,  modify  and  execute  phases. 

There  are  no  more  rules  to  match  with  the  appended  rules  nor  does  the  look-ahead  pro- 
cess successfully  match  the  retrieval  operations  which  will  be  introduced  while  verifying  the 
LHS  of  these  appended  rules.  The  MME  cycle  thus  enters  the  execute  phase.  All  operations 
and  rules  of  the  modified  transaction  are  executed. 

If  rule  gen_jConstr_l  is  successfully  executed;  i.e.,  a  constraint  is  violated,  then  an 
appropriate  flag  is  set  to  abort  the  INSERT  operation.  After  checking  for  this  flag,  the 
INSERT  operation  is  (conditionally)  executed.   In  parallel  with  the  INSERT  operation,  rules 
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'.^  gen_constr_2   and   gen_hier_l   are   also  executed.     If  the  LHS  of  either  gen^onstr_2  or 

.    ^  .'  gen_hier_2  are  satisfied,  then  the  RHS  consequents  of  these  rules  are  executed.    The  RHS 

consequents  of  these  rules  modify  the   transaction   and   append  operations.    Since  these 

appended  operations  may  match  against  some  value  independent  rules,  the  MME  cycle  has 

v'.;  to  be  re-invoked. 

The  RHS  consequent  of  rule  gen_hier_l  appends  an  INSERT  operation  against  the 
object  GOVT_PROJECT.  To  identify  that  the  transaction  has  been  modified  during  execu- 
tion, the  INSERT  operation  is  appended  with  execution  level  set  to  2.  Since  a  higher  execu- 
tion level  has  precedence  and  the  INSERT  operation  has  not  been  matched,  the  current  exe- 
cute phase  is  suspended  and  the  MME  cycle  is  re-invoked  with  an  execution  level  set  to  2. 
The  INSERT  operation  matches  with  trigger  T8.  In  the  next  modify  phase,  a  value  depen- 
dent rule  attr_inh_l  precedes  this  INSERT  operation;  these  operations  and  rules  are  marked 
at  execution  level  2  (see  Figure  6.4). 
;  There  will  be  no  more  matches  at  execution  level  2  and  the  operations  and  rules  at 

-  level  2  are  executed.    First  attr_inh_l  executes  and  obtains  values  from  the  user  for  the 

inherited  attributes  LOCATION  and  STATUS.  Then  the  INSERT  into  GOVT_PROJECT 
executes.  After  all  the  operations  and  rules  at  level  2  complete,  we  resume  the  execute  phase 
f.  with  execution  level  =  1. 

In  this  example,  there  are  three  parallel  branches  within  the  transaction  at  level  1, 

corresponding  to  the  INSERT  into  TOP_SECRET_PROJ,  rule  gen^onstr_2,  and  the  third 

[  -  ■ 

branch  corresponding  to  rule  gen_hier_l.  This  third  branch  caused  the  MME  cycle  to  be  re- 
invoked  at  level  2.  After  the  three  parallel  branches  at  level  1  complete  execution,  rule 
loc_stat— 1  at  level  1  is  executed. 

We  also  use  this  example  to  describe  the  implicit  selection  of  value  dependent  rules. 

;  Suppose  that  rules  gen.jconstr_2  and  loc_stat_l  are  not  included  in  any  explicit  inference 

chains;  i.e.,  rules  T6  and  T9  are  not  defined  for  object  type  TOP-^ECRETJPROJ.    In  this 

case,   when   the   transaction   is  modified,   there   will   be   two  parallel  branches  at   level   1, 
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corresponding  to  the  INSERT  into  TOP_SECRET_PROJ  and  rule  gen_hier_l.  Neither 
genjConstr_2  nor  loc_stat_l  will  be  included  in  the  modified  transaction. 

After  the  modified  transaction  completes  execution,  it  will  be  committed.  Next,  the 
operations  of  the  committed  transaction,  INSERT  into  TOP^ECRET_PROJ  and  INSERT 
into  GOVT_PROJECT  (conditional)  are  used  in  implicit  selection.  The  two  rules 
gen.jConstr_2  and  loc_stat_l  are  now  candidates  for  implicit  selection.  Both  these  rules  test 
occurrences  of  an  aff'ected  object  type,  TOP_SECRET_PROJ,  and  they  will  be  selected.  See 
Figure  6.4  for  these  implicitly  selected  rules  (enclosed  by  the  dotted  lines).  Note  that  rule 
loc^tat-l  will  also  be  selected  by  the  INSERT  into  GOVT_PROJECT,  but  it  will  be  exe- 
cuted just  once.  The  two  implicitly  selected  rules  can  be  treated  as  independent  transactions 
and  are  candidates  for  concurrent  execution.   This  is  discussed  in  Chapter  Seven. 

We  now  examine  another  example  with  a  backward  inference  chain  and  a  cycle  of  value 
dependent  rules.  The  relevant  rules  defined  for  the  object  type  der_P*S_P,  are  in  Figure  6.5. 
Consider  a  transaction  fragment  that  executes  a  RETRIEVE  operation  against  the 
der_P*S_P  object  type.  This  operation  matches  with  triggers  T4  and  T4'  and  two  value 
dependent  rules  trans^jcLl  and  transjcl_2  are  appended  to  the  transaction,  to  execute  in 
parallel  with  the  RETRIEVE  operation  (see  Figure  6.6a).  This  is  an  example  of  a  backward 
inference  chain;  T4  and  T4'  use  the  goal;  i.e.,  the  occurrences  of  der_P*S_P  that  are  to  be 
retrieved  to  bind  the  derived  der_P*S_P  occurrence,  Z,  on  the  RHS  of  trans-xLl  and 
trans^L2.  These  bindings  will  then  be  passed  within  the  two  rules  from  the  RHS  to  the 
LHS. 

The  rule  trans.jcl_2  causes  a  recursive  cycle  of  value  dependent  rules.  To  satisfy  its 
LHS,  trans^jcl_2  appends  a  retrieval  operation  against  der_P*S_P  (see  Figure  6.6b).  This 
retrieval  operation  matches  with  rules  T4  and  T4'  which  (recursively)  append  trans-xLl  and 
trans_cl_2  (see  Figure  6.6c).  This  cycle  will  not  cause  endless  execution;  it  includes  value 
dependent  rules  which  terminate  due  to  finite  object  occurrences. 


77 


This  recursive  cycle  is  actually  detected  during  the  look-ahead  process.  Prior  to  the 
execute  phase  corresponding  to  an  execution  level  =  1,  the  transaction  passes  through  the 
look-ahead  process  which  processes  the  operations  that  will  be  appended  at  level  2  (by  exe- 
cuting the  LHS  of  the  value  dependent  rules  at  level  1).  Rule  trans^l_2  at  level  1  will 
append  a  RETRIEVE  operation  against  der_P*S_P  at  level  2.  The  look-ahead  process  re- 
invokes  the  MME  cycle  at  level  2,  to  match  this  RETRIEVE  operation.  The  operation 
matches  with  rules  T4  and  T4'  and  (recursively)  appends  rules  trans_jcLl  and  trans-jcl_2,  at 
execution  level  2.  This  cycle  involving  rule  trans^l_2  spans  execution  levels;  it  is  detected 
and  the  look-ahead  processing  for  rule  transjcl_2  terminates.  The  cycle  is  not  eliminated 
since  it  includes  value  dependent  rules.  We  use  this  example  in  Chapter  Eight  to  discuss  the 
use  of  database  query  optimization  techniques  to  efficiently  process  linear  recursive  rules. 

6.3    Simulating  the  MME  Cycle  Using  a  Production  System 

The  mechanism  of  applying  rules  in  the  KBMS  was  simulated  using  the  OPS5  Produc- 
tion System  Language  [FOR81].  We  use  the  term  simulation  since  the  KBMS  transaction 
was  not  actually  executed  against  object  occurrences  of  a  knowledge  base.  The  objective 
was  to  verify  the  operation  of  the  various  phases  of  the  MME  cycle  and  to  obtain  insights  to 
support  an  efficient  implementation  of  an  integrated  KBMS. 

The  OPS5  system  has  a  single  processing  paradigm  of  production  rules.  Thus,  we  use 
production  rules  to  model  the  system  knowledge  and  the  domain  knowledge.  The  "domain" 
production  rules  model  the  value  independent  and  value  dependent  rules  defined  for  the 
object  types  and  occurrences,  specific  to  each  application.  On  the  other  hand,  the  "system" 
production  rules  control  the  different  phases  of  the  MME  cycle.  This  includes  matching  the 
transaction  against  a  relevant  subset  of  the  value  independent  domain  rules,  modifying  the 
transaction  by  incorporating  value  dependent  domain  rules  and  executing  the  modified  tran- 
saction. 
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The  OPS5  system  was  chosen  because  it  provides  excellent  support  for  production  rules 
and  pattern  matching  and  because  it  has  been  widely  used  to  implement  several  rule-based 
expert  systems.  However,  processing  in  a  production  system  (PS)  is  significantly  different 
from  the  processing  strategy  for  the  KBMS  that  we  described,  resulting  in  several  limitations 
to  our  simulation.  Ironically,  these  very  limitations  provided  the  greatest  benefit,  since  they 
provided  insight  into  the  functionality  of  the  KBMS  components  and  the  MME  cycle.  The 
different  implementation  issues  and  strategies  discussed  in  the  rest  of  this  thesis  originated 
from  the  limitations  of  this  0PS5  simulation. 

Conceptually,  there  are  many  differences  between  the  mechanism  of  applying  rules  in  a 
PS  and  in  the  KBMS.  The  OPS5  production  system  is  in  a  continuous  cycle  of  match,  select 
and  action.  During  each  cycle,  all  the  productions  or  rules  are  candidates  to  be  selected  for 
execution  and  the  selected  candidates  form  a  conflict  set.  The  language  definition  of  0PS5 
only  allows  a  single  production  rule  to  be  fired  at  any  instant  and  there  are  two  built-in 
conflict  resolution  strategies,  LEX  and  MEA,  that  determine  which  rule  is  to  be  chosen,  from 
the  conflict  set,  for  execution. 

For  the  simulation  of  the  control  strategy  of  the  MME  cycle  in  a  transaction  oriented 
KBMS,  we  had  to  circumvent  the  0PS5  selection  strategy  at  times  and  implement  our  own 
selection  strategy  so  as  to  accurately  model  the  operation  of  the  MME  cycle.  For  example, 
during  the  match  phase,  the  MME  cycle  only  uses  a  subset  of  the  rules;  i.e.,  the  value 
independent  triggers  that  match  against  the  triggering  operations  and  rules.  During  the 
modify  phase,  the  matching  triggers  are  used  to  modify  the  transaction  and  to  schedule 
operations  and  value  dependent  rules  for  execution.  The  transaction  can  also  be  modified 
during  the  execution  of  the  value  dependent  rules.  There  is  no  equivalent  to  the  concepts  of 
a  transaction,  modifications  made  to  a  transaction  or  the  process  of  explicit  selection  of  rules 
in  the  OPS5  system.  The  implicit  selection  of  value  dependent  rules  in  the  execute  phase  of 
the  MME  cycle  most  resembles  the  normal  processing  of  the  OPS5  system. 
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In  addition  to  these  differences,  there  are  some  further  drawbacks  to  using  OPS5. 
OPS5  has  neither  a  formal  knowledge  representation  model  nor  a  set  oriented  processing 
strategy.  The  lack  of  a  representation  model  was  overcome  by  simulating  a  network 
representation  so  that  data  and  relevant  rules  could  be  grouped  together  within  an  object 
type.  The  lack  of  set  processing  features  was  handled  through  the  use  of  tags  to  group 
together  those  operations  which  belong  to  a  single  set  oriented  operation. 

The  following  is  a  brief  description  of  our  simulation  of  the  MME  cycle  using  0PS5. 
The  initial  transaction  is  structured  as  a  sequence  of  operations  against  objects.  The  match 
level  and  the  execution  level  of  these  operations  are  initialized  to  1.  The  match  level  is  used 
to  identify  new  operations  and  rules  appended  to  the  transaction  during  the  modify  phase, 
after  successfully  matching  against  value  independent  triggers  in  the  match  phase.  The  exe- 
cution level  is  used  to  identify  new  operations  and  rules  appended  to  the  transaction  during 
the  execute  phase  of  the  MME  cycle. 

Operations  from  the  initial  transaction  are  matched  individually  against  the  value 
independent  triggers  defined  for  the  relevant  object  types.  If  the  match  is  successful,  then,  in 
the  following  modify  phase,  the  operation  (parent)  is  attached  to  a  modification  structure 
(child)  whose  match  level  and  execution  level  are  initialized  to  1.  Parent  and  child  pointers 
are  used  with  the  modification  structure.  This  modification  structure  is  used  to  capture  the 
modifications  to  the  (parent)  operation  of  the  transaction.  The  structure  points  to  new 
operations  and  rules  appended  in  the  modify  phase.  These  operations  and  rules  are  properly 
sequenced  using  the  options  specified  on  the  LHS  of  the  value  independent  triggers.  The 
matching  triggers  are  also  marked  with  the  corresponding  match  and  execution  levels  (=  1); 
this  is  used  later  to  detect  cycles,  as  will  be  seen. 

Each  of  the  new  operations  and  value  dependent  rules  pointed  to  by  the  modification 
structures  must  also  be  matched  and  modified;  this  is  done  recursively.  If  the  match  is  suc- 
cessful, then  this  new  operation  or  rule  (which  already  has  a  pointer  from  a  parent 
modification  structure)  will  now  point  to  its  own  child  modification  structure.    The  match 
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level  of  this  child  modification  structure  will  be  incremented  by  1  from  the  previous  match 
level  so  that  the  child  modification  structures  can  be  differentiated  from  the  parent.  The 
execution  level  is  still  set  to  1. 

Potential  cycles  are  easily  detected  when  a  there  is  a  match  with  a  trigger  which  has 
been  marked  to  indicate  a  previous  match  at  a  lower  match  level.  To  detect  actual  cycles 
(from  potential  cycles),  the  modification  structures  are  traversed  backwards,  via  the  parent 
pointers,  until  an  actual  cycle  is  detected  or  a  root  operation  at  match  level  =  1  is  encoun- 
tered. Once  detected,  these  cycles  must  either  be  eliminated  or  examined  to  ensure  they  will 
not  result  in  endless  execution.  The  match-modify  phases  terminate  when  no  more  matches 
are  possible. 

The  modified  transaction  at  the  current  execution  level  (=1)  is  now  processed  by  the 
look-ahead  process  which  re-invokes  the  MME  cycle  at  the  next  execution  level  (=  2),  if 
necessary.   This  process  works  as  follows:   For  all  value  dependent  rules  at  execution  level  = 

1,  the  retrieval  operations  that  will  be  appended  to  satisfy  their  LHS  are  marked  at  execu- 
tion level  =  2.  The  corresponding  match  level  is  initialized  to  1.  These  retrieval  operations 
now  pass  through  the  match-modify  phases  of  the  MME  cycle,  with  its  execution  level  set  to 

2.  If  there  is  a  match,  then  a  modification  structure,  also  at  execution  level  =  2,  is  attached. 

In  addition  to  cycles  with  the  same  execution  level,  there  is  now  a  possibility  of  cycles 
that  cross  execution  levels.  Again,  such  potential  cycles  are  first  identified  when  there  is  a 
match  with  a  trigger  which  is  marked  to  indicate  a  match  at  a  lower  execution  level.  Actual 
cycles  are  identified  by  traversing  backwards  via  the  modification  structures,  as  before. 

In  the  transitive  closure  example  discussed  in  the  previous  section,  a  retrieval  operation 
executed  on  the  LHS  of  rule  trans_£l_2  (execution  level  =  1)  matches  with  the  same  rule 
trans-xl_2  (execution  level  =  2).  These  cycles  do  not  have  to  be  eliminated  if  they  involve 
value  dependent  rules;  however,  unless  they  are  detected  the  look-ahead  process  will  not  ter- 
minate. 
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After  the  look-ahead  process  terminates,  the  modified  transaction  (operations,  rules  and 
modification  structures)  whose  execution  level  is  initialized  to  1,  is  executed.  First,  the 
operations  and  rules  corresponding  to  match  level  1  are  executed.  When  a  modification 
structure  is  encountered,  the  match  level  is  incremented  and  this  structure  is  used  to  obtain 
the  new  operations  and  rules.   These  too,  will  retain  an  execution  level  of  1. 

During  execution  of  the  value  dependent  rules,  the  transaction  may  be  modified  and 
new  rules  or  operations  may  be  introduced.  However,  the  execution  level  of  these  new  opera- 
tions  or  rules  are  incremented  by  1.  Operations  and  rules  at  higher  execution  levels  have 
precedence.  Thus,  execution  at  level  e  is  halted  when  operations  or  rules  at  level  (e+1)  are 
appended  to  the  transaction. 

Those  operations  at  level  (e+l)  that  have  already  passed  though  the  look- ahead  process 
do  not  require  further  matching.  If  they  had  been  modified  during  the  look-ahead  process, 
they  will  be  attached  to  a  modification  structure  with  execution  level  (e+l)  and  match  level 
=  1.  For  those  operations  that  have  not  been  through  the  look-ahead  process,  the  MME 
cycle  is  re-invoked  with  an  execution  level  of  (e+l)  and  with  the  match  level  re-initialized  to 
1. 

After  all  the  operations  and  rules  of  the  modified  transaction  complete  execution,  the 
changes  made  to  the  knowledge  base  are  committed.  To  complete  the  execute  phase  of  the 
MME  cycle  with  execution  level  1  requires  the  completion  of  the  implicit  selection  process. 
The  implicit  selection  and  execution  of  value  dependent  rules  during  the  execute  phase  does 
not  need  elaborate  support  in  OPS5,  since  this  closely  resembles  the  OPS5  strategy  of  select- 
mg  rules.   The  MME  cycle  completes  when  no  more  rules  are  implicitly  selected. 

OPS5  productions  are  compiled  into  a  Rete  network  and  the  efficiency  of  OPS5  is  attri- 
buted to  the  efficiency  of  the  Rete  algorithm  [FOR82];  however,  there  are  several  charac- 
teristics of  a  KBMS  that  reduces  the  efficiency  of  the  Rete  algorithm.  This  will  be  discussed 
m  detail  with  other  implementation  issues  in  the  next  chapter  where  we  will  suggest  alterna- 
tive methods  to  structure  and  select  rules. 
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The  use  of  the  object-oriented  paradigm  to  structure  rules  and  to  provide  a  binding 
between  data  and  relevant  rules  are  important  features  of  our  KBMS.  However,  the  Rete 
network  does  not  allow  such  a  structuring  of  rules.  Although  the  Rete  network  does  bind 
data  and  relevant  rules,  it  does  this  by  structuring  the  data  using  the  rules,  in  contrast  to 
structuring  both  data  and  rules  using  object  types  as  is  suggested  for  the  object-oriented 
KBMS. 

One  of  the  shortcomings  of  the  Rete  algorithm  is  the  lack  of  support  for  set  oriented 
operations.  This  precludes  the  use  of  efficient  set  oriented  DBMS  strategies  to  support  rule 
processing  in  a  KBMS.  As  a  result,  the  functional  integration  of  a  DBMS  with  a  rule-based 
system,  within  the  KBMS,  which  is  a  cornerstone  of  our  research  could  not  be  simulated. 
Functional  integration  of  the  KBMS  components  is  the  topic  of  Chapter  Seven. 

OPS5  also  does  not  permit  a  rule  to  execute  another  rule,  whereas  this  is  an  important 
feature  of  our  scheme  for  building  explicit  inference  chains,  where  a  trigger  explicitly  selects 
a  value  dependent  rule  during  the  modify  phase.  This  feature  had  to  be  simulated  in  our 
model  using  side  effects  and  a  blackboard. 

OPS5  only  allows  a  single  production  to  be  fired  at  any  instant.  Our  simulated  system 
could  not  difl-erentiate  between  system  and  domain  production  rules  and  always  selected  a 
single  rule  for  execution.  As  a  result,  we  had  to  alternate  between  supporting  the  functions 
of  the  MME  cycle  and  the  execution  of  the  value  dependent  domain  rules.  For  example,  we 
had  to  suspend  the  execute  phase  (at  level  k)  when  an  operation  or  rule  was  appended  which 
required  the  re-invocation  of  the  MME  cycle  at  level  (k-hl).  We  could  resume  the  execute 
phase  at  level  k  only  after  the  completion  of  the  AdME  cycle  at  level  (k-M).  Thus,  we  could 
not  simulate  possible  parallelism  between  phases  at  difl-erent  execution  levels. 

This  also  meant  that  the  potential  for  parallelism  within  a  single  KBMS  transaction 
could  not  be  simulated  since  the  execution  of  the  transaction  was  also  modeled  using  produc- 
tions.  The  paralle'  execution  of  triggering  operations  or  rules  and  the  RHS  consequent  of  the 
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triggers  that  was  described  in  Figures  6.1  and  6.2  and  wliich  occurred  in  the  example  tran- 
saction fragments  of  the  previous  section  (see  Figures  6.4  and  6.6)  was  not  simulated. 

The  serialized  execution  of  production  rules  in  OPS5  (and  most  rule-based  systems) 
guarantees  the  correctness  of  rule  execution.  However,  in  the  interests  of  execution 
efficiency,  it  is  useful  to  investigate  the  behavior  of  the  MME  cycle  if  this  limitation  were  not 
imposed.  Research  in  the  concurrent  execution  of  DBMS  transactions  has  resulted  in  a  seri- 
alizability  criterion  for  correctness  of  an  interleaved  execution  of  concurrent  transactions  and 
algorithms  that  guarantee  serializability.  In  the  next  chapter  we  show  that  the  serializability 
criterion  could  also  be  applied  to  concurrent  execution  of  rules  in  a  KBMS. 

As  mentioned  earlier,  it  was  from  the  limitations  of  our  0PS5  simulation  that  we 
gained  much  insight  into  the  task  of  adequately  supporting  the  MME  cycle  in  the  KBMS. 
The  simulation  helped  to  identify  several  implementation  issues  that  will  be  discussed  next. 


84 


triggering 
operation    in 
transaction 


INSERT 
into 
objecti 


trigger 


INSERT 

into 
objecti 


INSERT 

into 
objecti 


IF  pre-exec  (INSERT  into  objecti) 
THEN  (RHS  consequent) 


IF  post-exec(INSERT  into  objecti) 
THEN  (RHS  consequent) 


IF  par-exec(INSERT  into    objecti) 
THEN  (RHS  consequent) 


modified    transaction 


RHS  consequent 
I 
INSERT  into  objecti 


INSERT  into  objecti 
I 
RHS  consequent 
I 


INSERT  into 
objecti 


RHS 

consequent 


+     conditionally  executed  if  no  abort 
flag  is  set 
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T5  :  TOP^ECRET-PROJ 

IF  pre-exec(INSERT  an  occurrence  X  into  TOP_SECRET_PROJ) 
THEN  (EXECUTE  gen-x;onstr_l(X)  :  GOVT_PROJECT) 

gen-Xonstr_l(X)  :  GOVT_PROJECT 

IF  (for  an  inserted  occurrence  X  of  TOP_SECRET_PROJ,  there  exists 

an  occurrence  Y  of  NON_MILITARY_PROJ,  such  that  EQUAL(X.OID,  Y.OID) 
THEN  (alert  the  KBMS  to  reject  occurrence  X) 

T6  :  TOP_SECRET_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  TOP^ECRET_PROJ 
THEN  (EXECUTE  gen^onstr_2(X)  :  GOVT_PROJECT) 

gen^onstr_2(X)  :  GOVT_PROJECT 
IF  (for  an  inserted  occurrence  X  of  TOP_5ECRET_PROJ 
NOT^ET_MEMBER(X.OID,  MILITARY-PROJ.OID)  ) 

THEN  (INSERT  an  occurrence  Z  into  MILITARY_PROJ  where  Z.OID  =X.OID  ) 

T7  :  TOP^ECRET_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  TOP_SECRET_PROJ) 
THEN  (EXECUTE  gen_hier_l(X)  :  GOVT_PROJECT) 

gen_hier_l(X)  :  GOVT_PROJECT 

IF  (for  occurrence  X  inserted  into  a  constituent  of  GOVT_PROJECT 
NOTJSET_MEMBER(X.OID,  GOVT_PROJECT.OID) ) 

THEN  (INSERT  an  occurrence  P  into  GOVT_PROJECT 
where  P.OID  =  X.OID  ) 

T8  :  GOVT_PROJECT 

IF  pre-exec(INSERT  an  occurrence  X  into  GOVT_PROJECT) 

THEN  (obtain  values  for  attributes  X.LOCATION  and  X.STATUS  from  user) 

T9  :  TOP-^ECRET_PROJ 

IF  post-exec(INSERT  occurrence  X  into  TOP^ECRET_PROJ) 
THEN  (EXECUTE  loc^tat_l(X)  :  GOVT_PROJECT) 

loc^tat_l(X)  :  GOVT-PROJECT 

IF  (for  an  inserted  occurrence  X  of  TOP^ECRET_PROJ,  there  exists 

an  occurrence  Y  of  GOVT-PROJECT,  such  that  EQUAL(X.OID,  Y.OID)  AND 
EQUAL(Y.STATUS,  "testing")  AND  NOT_EQUAL(Y.LOCATION,  "Virginia")  ) 
THEN  (UPDATE  Y  such  that  Y.LOCATION  =  "Virginia") 


Figure  6.3    Knowledge  Rules  Relevant  to  TOP^ECRET_PROJ 
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T4  :  der_P*S-P 

IF  par-exec(RETRIEVE  occurrence  X  from  der_P*S_P) 
THEN  (EXECUTE  trans^U(X)  :  der_P*S_P) 

trans-xLl(Z)  :  der_P*S_P 

IF  (there  exists  an  occurrence  Y  of  P*S_P) 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  such  that  Z  =  Y) 


T4'  :  der_P*S_P 

IF  par-exec(RETRIEVE  occurrence  X  from  der_P*S_P) 
THEN  (EXECUTE  trans^l_2(X)  :  der_P*S_P) 

trans^l_2(Z)  :  der_P*S_P  .] 

IF  (there  exist  occurrences  P  and  Q  of  der_P*S_P  and  P*S_P,  respectively,  such  that 
SET_MEMBER(Q.PART,  P.SUB_PARTS) 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  where  Z.PART  =P.PART 

AND  Z.SUB_PARTS  =  the  union  of  P.SUB_PARTS  and  Q.SUB-PARTS) 


Figure   6.5    Knowledge  Rules  Relevant  to  the  Object  der_P*S_P 
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Figure  6.6  Transaction  Fragment  with  Cycle  of  Value  Dependent  Rules 

a)  Modified    transaction   fragment   at   level    1    b)  Transaction   fragment 
executing  at  level  1  c)  Transaction  fragment  executing  at  levels  1  and  2 
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Figure  6.6  (^continued)  Transaction  Fragment  with  Cycle  of  Value  Dependent  Rules 
a)  Modified  transaction  fragment  at  level  1  b)  Transaction  fragment  executing  at 
level  1  c )  1  ransaction  fragment  executing  at  levels  1  and  2 


CHAPTER  Vn 
A  DBMS  APPROACH  TO  PROCESSING  KBMS  TRANSACTIONS 

We  used  a  transax;tion  oriented  paradigm  to  characterize  processing  in  the  KBMS  and 
described  the  operation  of  the  MME  cycle  that  executes  a  transaction.  In  this  chapter  and 
the  next  we  investigate  possible  implementation  strategies  that  could  be  used  to  support  the 
MME  cycle  in  the  integrated  KBMS. 

A  key  feature  of  our  design  is  the  use  of  a  KML  both  to  specify  operations  and  rules. 
The  initial  KBMS  transaction,  itself,  comprises  KML  operations.  Consequently,  value  depen- 
dent rules  are  incorporated  into  a  transaction  and  the  execution  of  a  rule  is  characterized 
using  concepts  from  conventional  DBMS  processing.  Verifying  the  LHS  of  a  value  dependent 
rule  corresponds  to  retrieval  of  knowledge  base  object  occurrences  and  applying  the  RHS 
corresponds  to  executing  storage  manipulation  on  these  occurrences. 

A  common  characterization  leads  to  functional  integration  of  DBMS  processing  and 
rule  processing  within  the  KBMS.  Given  this  functional  similarity  in  the  KBMS  components 
one  goal  then,  is  to  eliminate  any  redundancy  from  a  functionally  integrated  KBMS.  In  addi- 
tion to  eliminating  redundancy,  functional  integration  will  also  allow  the  migration  of  tech- 
niques such  as  optimization,  the  concurrent  execution  of  transactions,  etc.,  from  the  DBMS, 
and  their  incorporation  in  the  KBMS.  All  these  techniques  have  been  well  researched  in  the 
context  of  conventional  DBMS. 

Functional  integration  is  key  to  understanding  issues  for  the  efficient  implementation  of 
a  KBMS.  Although  we  have  not  attempted  any  implementation  of  the  KBMS,  beyond  the 
OPS5  prototype  described  in  Chapter  Six,  we  have  investigated  several  implementation 
issues  to  be  discussed  in  the  rest  of  this  chapter  and  the  next  chapter. 
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This  chapter  is  organized  as  follows:  First,  we  introduce  a  performance  measure  to 
guide  the  implementation  of  the  MME  cycle.  Next,  we  review  available  techniques  for  pro 
cessing  rules,  both  in  AI  expert  systems  and  extended  DBMS,  identifying  advantages  and 
shortcomings. 

We  then  investigate  several  implementation  issues.  The  first  issue  concerns  the  struc- 
turing of  rules  in  the  knowledge  base.  The  object-oriented  paradigm  was  used  to  structure 
data  and  relevant  rules  within  the  object  types  of  the  knowledge  base.  However,  several 
rules  may  be  defined  for  or  inherited  by  a  single  user  object  type  and  depending  on  the 
category  of  the  rules,  they  are  used  difiTerently.  In  the  interests  of  an  efficient  implementa- 
tion, the  rules  defined  for  a  single  object  type  also  have  to  be  structured;  this  structuring 
should  reflect  and  exploit  difi"erences  in  usage,  in  the  MME  cycle. 

In  the  interests  of  eflicient  retrieval,  it  is  important  to  identify  the  context  of  object 
occurrences  relevant  to  the  LHS  condition  of  a  rule.  Part  of  this  context  is  specified  by  the 
operations  of  the  transaction  that  trigger  the  explicit  or  implicit  selection  of  these  rules  and 
we  examine  the  context  information  provided  by  various  operations. 

Next,  we  study  the  migration  of  features  from  a  DBMS  to  a  KBMS.  One  feature  deals 
with  the  increased  efficiency  resulting  from  the  interleaved  execution  of  concurrent  DBMS 
transactions.  If  we  can  identify  and  isolate  KBMS  transactions  that  can  be  executed  in 
parallel,  then  we  can  benefit  from  concurrency.  The  approach  taken  is  first,  to  identify 
sources  of  parallelism  within  the  MME  cycle  that  executes  transactions  and  isolate  a  set  of 
independent  transactions.  Then  we  extend  the  serializability  criterion  for  the  correctness  of 
concurrent  DBMS  transactions  to  the  KBMS.  We  prove  that  the  concurrent  execution  of  a 
set  of  KBMS  transactions  is  equivalent  to  a  particular  serial  execution  of  the  same  set.  We 
also  examine  DBMS  concurrency  control  algorithms,  such  as  two  phase  locking  from  the 
viewpoint  of  KBMS  transactions. 

In  the  next  chapter,  we  continue  this  cros^fertilization  and  show  that  DBMS  query 
optimization  strategies  can  be  used  to  optimize  KBMS  processing.    We  do  this  by  applying 
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DBMS  query  optimization  techniques  to  a  KBMS  transaction  that  evaluates  a  linear  recur- 
sive query,  such  as  transitive  closure. 

7.1    A  Performance  Measure  for  the  MTVTE  Cycle  Trnplpmentation 

Eflficiency  in  executing  a  transaction  is  key  to  the  success  of  a  conventional  DBMS. 
Correspondingly,  efficiency  of  execution  of  the  MME  cycle  is  a  key  consideration  in  the 
implementation  of  the  KBMS.  One  goal  of  the  implementation  is  that  the  execution  of  the 
modified  KBMS  transaction,  by  the  MME  cycle,  should  be  comparable,  in  efficiency,  to  the 
execution  of  the  original  unmodified  transaction  (in  a  conventional  DBMS).  For  example,  if 
the  original  transaction  was  in  a  form  that  could  benefit  from  DBMS  query  optimization 
techniques  based  on  query  decomposition,  result  sharing,  parallelism,  pipelining,  etc.,  then 
these  same  benefits  should  potentially  apply  to  the  modified  KBMS  transaction  as  well. 
Details  of  optimization  techniques  are  given  in  Chapter  Eight. 

The  degree  of  performance  degradation  of  the  original  transaction  that  is  tolerable  in 
the  KBMS  environment  will  depend  on  the  type  of  rule  that  modifies  the  transaction  and  the 
enhancements  provided  through  its  use.  Modifications  caused  by  the  rules  usually  imply  pro- 
cessing overhead.  We  aim  to  isolate  the  modifications,  caused  by  the  rules,  from  the  original 
transaction.  This  will  reduce  interference  and  minimize  performance  degradation  of  the  ori- 
ginal transaction. 

There  is  always  the  possibility  that  rules  can  capture  information  that,  through  tran- 
saction modification,  enhances  or  optimizes  the  execution  of  the  original  transaction.  This 
possibility  has  not  been  considered  in  this  discussion. 

There  has  been  much  research  in  the  optimization  of  DBMS  transactions  [FIN82, 
KIM80,  KIM84a,  KIM84b,  SEL86  and  ST076],  to  improve  execution  efficiency.  Most  of  this 
research  has  concentrated  on  optimizing  a  single  retrieval  query.  There  has  been  some 
research  in  the  global  optimization  of  multiple  retrieval  queries  [KIM84a,  KIM84b  and 
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SEL86]   but  very   little  research  in  the  optimization  of  storage  manipulation  operations 
[SEL86],  probably  because  not  much  benefits  are  expected. 

All  these  techniques  operate  in  a  compiled  mode;  they  assume  that  the  optimized 
DBMS  transaction  is  static;  i.e.,  it  does  not  get  modified  during  execution. 

Compared  to  a  DBMS  transaction,  a  KBMS  transaction  is  much  more  complex;  it 
comprises  retrieval  queries,  storage  manipulation  operations  and  value  dependent  rules  which 
themselves  comprise  retrieval  and  manipulation  operations.  A  KBMS  transaction  explicitly 
schedules  the  parallel  execution  of  operations  and  value  dependent  rules  and  marks  both 
operations  and  rules  as  being  conditionally  executed.  The  MME  cycle  cannot  ensure  that  the 
KBMS  transaction  will  remain  static  while  executing  the  value  dependent  rules.  In  fact,  it  is 
precisely  these  value  dependent  rules  that  capture  problem  solving  knowledge;  thus,  value 
dependent  modification  of  the  KBMS  transaction  is  an  important  feature  of  the  KBMS  which 
must  be  supported. 

This  results  in  a  dilemma;  if  for  efficiency  considerations,  we  wish  to  optimize  the  tran- 
saction, then  this  requires  that  the  optimized  KBMS  transaction  remain  static  during  execu- 
tion, whereas,  the  very  nature  of  knowledge  based  systems  requires  that  value  dependent 
transaction  modification  be  supported.  Consequently,  our  approach  to  optimization  and  exe- 
cution of  a  KBMS  transaction  is  a  compromise.  While  we  cannot  ensure  that  the  KBMS 
transaction,  as  a  whole,  remains  static,  what  we  can  do  is  identify  those  portions  of  the 
KBMS  transaction  that  have  already  been  modified  and  will  remain  static  during  execution. 
Now,  this  static  portion  of  the  KBMS  transaction  satisfies  the  required  assumptions  and  can 
be  optimized  using  conventional  DBMS  optimization  techniques. 

In  the  MME  cycle,  for  a  particular  execution  level  k,  optimization  takes  place  after  the 
match-modify  phases  and  the  "look-ahead"  process  complete.  The  KBMS  transaction  that  is 
to  be  optimized  will  comprise  retrieval  and  storage  manipulation  operations  as  well  as  value 
dependent  rules.  The  optimizer  will  not  try  to  optimize  storage  manipulation  operations.  In 
addition  to  the  retrieval  operations  at  execution  level,  k,  the  retrieval  operations  introduced 
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at  execution  level  (k+1),  corresponding  to  the  LHS  of  rules  at  level  k  (that  have  already  been      :     ,  '  ^ 

passed  through  the  match-modiiy  phases  of  the  MA4E  cycle  by  the  "look-ahead"  process)  are  '^ 
also  static  and  thus,  candidates  for  optimization. 

In  the  context  of  the  KBMS,  that  part  of  the  optimizer  that  deals  with  individual  '' 
retrieval  queries  will  be  unaffected;  i.e.,  it  will  be  no  different  from  a  conventional  optimizer.  ; 
The  global  optimization  of  multiple  retrieval  queries  will  be  handled  somewhat  differently  '' 
from  a  conventional  DBMS.  One  difference  is  that  those  retrieval  operations  that  are  expli-  ' 
citly  specified  to  execute  in  parallel  are  obvious  candidates  for  global  optimization;  conven-  "' ^ 
tional  DBMS  do  not  explicitly  execute  parallel  retrieval  operations  within  a  single  transac- 
tion. .■'[  ? 

When  multiple  DBMS  queries  are   being  globally  optimized,   it  is  advantageous  to  ' 

decompose  the  queries,  find  common  sub-expressions  for  sharing  intermediate  results  and  ' 

sometimes  even  to  re-order  the  retrieval  queries  within  the  transaction  [BOR80,  FIN82, 
KIM84a  and  KIM84b].    There  are  two  constraints  that  must  be  met  by  the  optimizer  to  ■ ;' /, 

maintain  consistency  and  the  semantics  associated  with  the  queries  [KIM84aj.    First,  if 
retrievals  are  nested,  then  this  nested  ordering  must  be  maintained.    Second,  if  retrieval  ' 

queries  and  manipulation  operations  are  intermixed  in  the  DBMS  transaction  and  if  the  data  - 

manipulation  operations  affect  the  data  that  is  retrieved,  then  re-ordering  of  the  retrieval  - 

queries  is  permissible,  only  if  these  queries  maintain  the  original  ordering  with  respect  to  the 
data  manipulation  operation  after  optimization.    This  is  in  the  interest  of  maintaining  con-  ! 

sistency.  To  make  the  optimizer  less  complex,  only  groups  or  sequences  of  queries  that  are 
not  separated  by  data  manipulation  operations  are  globally  optimized. 

In   the   case  of  the   KBMS   transaction,   the  second   constraint  is  applied  somewhat  '} 

differently,  as  compared  to  a  DBMS.    In  addition  to  the  storage  manipulation  operations  -,     ■ 

already  in  the  KBMS  transaction  at  execution  level  k,  there  is  the  possibility  that  the  RHS  of 
the  value  dependent  rules  at  execution  level  k  can  modify  the  transaction  and  append  more' 
storage  manipulation  operations.    So,  we  assume  that  any  value  dependent  rule  is  a  possible 
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source  of  storage  manipulation  operations.  Consequently,  only  retrieval  queries  that  are  not 
separated  by  either  storage  manipulation  operations  or  (the  RHS  actions  of)  value  dependent 
rules  are  candidates  for  global  optimization.  The  retrieval  operations  that  are  explicitly  exe- 
cuted in  parallel  will  satisfy  this  constraint  of  not  being  separated  by  storage  manipulation 
operations  or  value  dependent  rules. 

For  example,  in  Figure  6.4,  retrieval  operations  on  the  LHS  of  rules  gen^onstr_2  and 
gen_hier_l  can  be  globally  optimized.  The  RHS  of  these  rules  introduce  storage  manipula- 
tion operations  which  separate  these  retrieval  operations  from  the  retrieval  operations  on  the 
LHS  of  rule  loc_stat_l;  thus,  the  retrieval  operations  of  rule  loc_stat_l  cannot  be  globally 
optimized  with  the  previous  retrieval  operations. 

Since  the  transaction  is  optimized  after  the  "look-ahead"  process,  the  retrieval  opera- 
tions introduced  on  the  LHS  of  the  value  dependent  rules  are  candidates  for  optimization. 
This  leads  to  an  interesting  method  for  the  optimization  of  linear  recursive  queries.  The 
transitive  closure  example  of  Chapter  Five  results  in  the  execution  of  a  recursive  cycle  of 
value  dependent  rules  that  crosses  execution  levels.  This  cycle  is  detected  by  the  "look- 
ahead"  process.  The  corresponding  KBMS  transaction  fragment,  described  in  Figure  6.6, 
comprises  parallel  retrieval  operations  at  different  execution  levels,  occurring  in  the  recursive 
cycles.  These  parallel  retrieval  operations  are  candidates  for  optimization.  We  postpone  this 
discussion  on  the  efficient  evaluation  of  linear  recursive  queries  in  the  KBMS  to  Chapter 
Eight,  where  a  detailed  analytical  study  is  presented. 

Those  operations  or  rules  that  are  conditionally  executed,  depending  on  the  outcome  of 
executing  some  value  dependent  rule,  are  excluded  from  the  optimization  process.  For  exam- 
ple, security  and  integrity  constraints,  that  are  usually  associated  with  the  pre-exec  option  of 
Figures  6.1  and  6.2,  represent  overhead  in  a  transaction.  The  execution  of  the  triggering 
operation  or  rule  is  often  conditional  on  the  constraint.  The  execution  of  the  triggering 
operation  or  rule  is  delayed  until  the  constraint  satisfactorily  completes  execution;  i.e.,  the 
execution  of  these  constraints  cannot  be  isolated  from  the  triggering  operation  or  rule.   As  a 


"■  '"-^^''^ 


98 


result  neither  triggering  retrieval  operations  nor  rules  (retrieval  operations  on  their  LHS) 
that  are  associated  with  the  pre-exec  option  are  candidates  for  optimization. 

Other  constraints,  such  as  referential  constraints  for  example,  do  not  directly  affect  the 
execution  of  the  triggering  operation  or  rule.  Rules  that  enforce  these  constraints  use  either 
the  par-exec  or  the  post-exec  options  without  unduly  delaying  the  triggering  operation  or 
rule.  They  do  not  prevent  the  triggering  operation  or  rule  from  being  a  candidate  for  optimi- 
zation. The  rules  that  enforce  these  constraints  can,  however,  be  a  source  of  storage  mani- 
pulation operations  which  may  affect  the  optimizer,  as  previously  described. 

Rules  associated  with  deduction,  for  example,  rules  that  derive  new  object  occurrences 
on  the  RHS  to  satisfy  a  query,  etc.,  cannot  be  considered  overhead  since  they  provide  added 
functionality.  Although  enhancing  a  system  in  this  manner  usually  degrades  performance,  in 
this  case,  the  degradation  can  be  minimized  since  these  rules  can  often  be  executed  in  paral- 
lel with  the  triggering  operation.  These  rules  do  not  prevent  the  corresponding  triggering 
retrieval  operation  from  being  a  candidate  for  optimization.  In  addition  the  retrieval  opera- 
tions on  the  LHS  of  these  rules  are  also  candidates  for  optimization. 

To  summarize,  in  conventional  database  systems,  optimization  of  queries  within  a  tran- 
saction is  done  before  the  transaction  starts  execution  and  it  is  assumed  that  the  optimized 
transaction  is  static.  To  be  able  to  utilize  these  DBMS  optimization  techniques  we  must 
ensure  that  the  KBMS  transaction  also  is  static.  This  is  contrary  to  the  problem  solving 
nature  of  the  KBMS  environment  which  is  committed  to  support  value  dependent 
modification  of  the  transaction,  during  execution.  Our  compromise  solution  is  to  identify 
portions  of  the  KBMS  transaction  that  are  already  modified  and  are  guaranteed  to  remain 
static  and  optimize  only  these  static  portions  of  the  KBMS  transaction. 

Global  optimization  of  retrieval  operations  in  the  KBMS  transaction  is  beneficial  in 
situations  when  several  queries  access  the  same  object  occurrences,  as  in  the  case  of  linear 
recursive  queries.  This  technique  loses  its  advantage  if  the  retrieval  queries  are  unrelated  or 
if  they  are  separated  by  storage  manipulation  operations  (that  prevent  global  optimization). 
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In  these  circumstances,  we  make  use  of  an  alternative  technique,  namely  concurrent 
execution.  Concurrent  execution  of  a  set  of  DBMS  transactions  has  been  extensively 
researched  and  has  been  found  to  enhance  performance.  If  we  can  identify  concurrent 
KBMS  transactions;  i.e.,  sequences  of  operations  that  can  be  isolated  and  executed  con- 
currently, then  we  can  exploit  this  technique  of  concurrent  execution,  in  the  KBMS. 

The  value  dependent  rules  that  are  implicitly  selected  during  the  execute  phase  of  the 
miE  cycle,  after  the  changes  to  the  knowledge  base  are  committed,  are  worth  investigation, 
as  candidates  for  concurrent  execution.  In  the  description  of  the  MME  cycle,  we  saw  that  all 
the  IvML  operations  of  the  modified  transaction  were  used  collectively  to  implicitly  select  a 
group  of  (possibly  unrelated)  value  dependent  rules.  Conceptually,  all  these  implicitly 
selected  value  dependent  rules  can  be  executed  in  parallel.  In  other  words,  a  set  of  transac- 
tions corresponding  to  the  implicitly  selected,  value  dependent  rules  are  candidates  for  con- 
current execution.  Concurrent  execution  of  KBMS  transactions  selected  in  the  execution 
phase  of  the  MME  cycle  will  be  investigated  in  this  chapter. 

7.2  Review  of  Aviiilnhlp  Terlinignps  for  Pr^^^^sinc  Rni^^ 

The  execution  efficiency  of  0PS5,  one  of  the  most  widely  used  production  systems  has 
been  attributed  to  the  efficiency  of  the  Rete  algorithm  [FOR82J.  This  algorithm  exploits  the 
property  of  temporal  redundancy;  i.e.,  the  execution  of  a  production  or  a  rule  in  each  cycle 
results  in  very  small  changes  to  memory.  Thus,  the  Rete  algorithm  minimizes  computation 
in  each  cycle  by  saving  the  state  from  each  cycle  in  a  discrimination  net  in  the  form  of 
tokens  and  using  the  saved  state  in  the  next  cycle. 

There  are  several  characteristics  of  relevant  knowledge  bases  and  of  the  MME  cycle 
which  are  different  from  a  PS  environment;  each  of  these  differences  will  make  the  Rete  algo 
rithm  less  attractive  for  use  in  the  KBMS.  We  now  describe  the  Rete  algorithm  and  discuss 
each  of  these  differences. 
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The  LHS  of  0PS5  productions  are  expressed  as  a  conjunction  of  condition  elements. 
The  Rete  algorithm  compiles  these  condition  elements,  in  lexical  ordering,  into  a  binary 
discrimination  network.  Working  memory  elements  (or  OPS5  data)  that  satisfy  the  condi- 
tion elements  are  input  into  the  discrimination  net  and  flow  through  its  nodes.  Each  node  of 
the  discrimination  net  stores  tokens  corresponding  to  working  memory  elements  that  satisfy 
the  net,  i.e.,  the  conjunction  of  condition  elements  below  the  node.  Figure  7.1  shows  an 
example  of  a  discrimination  network  corresponding  to  a  production  whose  LHS  is  a  conjunc- 
tion of  condition  elements  Cl,C2,  .  .  .  ,  Cn.  The  output  from  the  net  is  all  the  applicable 
productions  whose  LHS  is  satisfied  (plus  the  satisfying  tokens)  and  is  called  the  conflict  set. 
In  each  0PS5  cycle  a  single  rule  is  selected  from  the  conflict  set  for  execution. 

The  function  of  the  discrimination  net  is  analogous  to  indexing  working  memory  ele- 
ments using  the  rules.  The  net  is  also  an  inherently  redundant  storage  structure  since  it 
stores  a  token  for  each  memory  element  satisfying  a  production  and  a  single  memory  element 
could  simultaneously  satisfy  several  productions  [BEI86,  MIR84  and  SCH86]. 

An  engineering  design  database  is  characterized  by  complex  object  types  and  several 
occurrences  of  the  same  object  type.  Assume  that  each  occurrence  of  a  knowledge  base 
object  type  is  represented  by  a  working  memory  element  of  a  PS.  Then,  each  working 
memory  element  could  be  complex  since  object  types  could  be  complex.  Also,  there  will  be  a 
large  number  of  working  memory  elements  since  each  object  type  has  several  occurrences. 

Each  condition  element  on  the  LHS  of  the  productions  could  be  simultaneously  satisfied 
by  several  object  occurrences  of  a  single  object  type;  as  a  result  each  node  of  a  discrimination 
net  will  store  several  tokens.  In  addition,  a  single  occurrence  of  a  complex  object  type  could 
simultaneously  satisfy  several  rules;  i.e.,  the  same  token  could  be  stored  at  many  nodes.  The 
inherent  redundancy  of  the  discrimination  network  will  increase  and  will  result  in  a  decrease 
of  the  processing  efficiency  of  the  Rete  algorithm. 

Under  these  conditions  it  is  more  efficient  to  index  rules  by  object  type.  Satisfying  the 
LHS  of  a  selected  rule  will  retrieve  object  occurrences  that  are  also  indexed  by  object  type. 
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Applying  a  rule  changes  working  memory  and  updates  the  discrimination  net.  This, 
too,  requires  access  to  object  occurrences.  The  discrimination  net  is  not  optimized  for 
efficient  access  to  object  occurrences  and  as  the  number  and  complexity  of  occurrences  to  be 
retrieved  increases,  the  performance  of  the  algorithm  decreases.  The  static  nature  of  the 
discrimination  net  forces  retrievals  to  be  performed  in  a  fixed,  lexical  order  [MIR84].  This 
prevents  optimization  of  retrievals  and  results  in  poor  performance. 

The  static  nature  of  the  discrimination  net  also  results  in  all  rules  having  equal  priority 
and  being  tested  in  each  cycle  to  see  if  they  must  be  applied.  In  contrast,  in  a  transaction 
oriented  KBMS,  the  explicit  selection  of  value  dependent  rules  restricts  the  number  of  rules 
that  must  be  tested  at  any  instant.   This  feature  cannot  be  exploited  by  the  Rete  algorithm. 

The  lack  of  support  for  parallel  execution  of  rules  is  a  major  drawback  of  current  PS. 
0PS5  arbitrarily  selects  and  executes  one  rule  from  all  the  applicable  rules  in  the  conflict  set. 
An  important  assumption  made  by  the  Rete  algorithm  is  that  of  temporal  redundancy.  This 
assumes  that  the  knowledge  base  changes  incrementally,  with  the  application  of  the  selected 
rule.  However,  one  of  the  characteristics  of  a  design  database  is  that  several  object 
occurrences  of  the  same  object  type  could  simultaneously  satisfy  a  single  rule  [BEI86,  MIR84 
and  VAN86]  and  this  results  in  several  instantiations  of  the  same  rule  being  placed  in  the 
conflict  set  of  rules. 

In  the  interests  of  efficiency,  all  the  instantiations  of  a  single  rule,  and  perhaps  all  the 
applicable  rules  in  the  conflict  set  should  be  executed  concurrently,  provided  there  is  no 
conflict  between  the  instantiations.  The  Rete  algorithm  cannot  accommodate  concurrent 
execution  of  several  instantiations  of  a  single  rule  or  of  diff"erent  rules.  In  addition,  con- 
current execution  would  no  longer  maintain  temporal  redundancy  and  this  will  destroy  the 
benefits  of  the  Rete  algorithm.  The  lack  of  support  for  the  concurrent  execution  of  rules  in 
the  KBMS  is  a  major  drawback  of  the  Rete  algorithm. 

An  alternate  algorithm,  TREAT,  has  been  developed  for  the  DADO  machine  [MIR84]. 
The  TREAT  algorithm  remembers  the  conflict  set  of  applicable  rules  but  does  not  save  state 


102 


information  in  a  discrimination  set.  TREAT  recognizes  the  equivalence  between  verifying 
the  LHS  of  a  production  and  retrieving  corresponding  memory  elements.  The  LHS  of 
selected  rules  are  treated  as  retrieval  queries  and  working  memory  elements  are  retrieved  to 
satisfy  the  rules.  This  allows  optimization  of  the  retrieval  queries.  We  will  support  a  similar 
strategy  in  our  KBMS  to  identify  objects  occurrences  that  satisfy  the  LHS  of  value  depen- 
dent rules.   However,  TREAT  does  not  currently  allow  parallel  execution  of  rules. 

In  contrast  to  the  discrimination  network  which  indexes  object  occurrences  by  produc- 
tion rules,  in  ST085  and  ST086,  a  technique  of  setting  a  lock  on  rules  by  the  data  or  tuples 
that  they  affect  is  suggested.  Rules  that  support  forward  chaining  triggers,  rules  that  infer 
data  for  virtual  fields  of  a  relation  and  rules  that  support  data  objects  which  are  query 
language  commands  are  supported  by  this  scheme.  When  a  tuple  is  inserted  into  the  datar 
base  it  sets  a  lock  corresponding  to  the  predicates  (qualifiers  in  the  rules)  that  it  may  be 
covered  by.  In  effect,  each  tuple  is  an  index  to  those  rules  that  may  apply  to  it.  Two 
schemes,  one  involving  physical  locks  and  the  other  involving  a  special  data  structure  based 
on  R-trees  [GUT84],  are  suggested.  Then,  applying  rules  can  be  reduced  to  the  task  of 
finding  all  the  rules  locked  by  the  tuples  involved  in  a  query  and  processing  these  rules. 

This  locking  technique  is  very  efficient  when  a  rule  is  specific  to  a  few  tuples  of  a  rela- 
tion, since  it  will  be  accessed  only  when  those  tuples  are  processed.  However,  this  advantage 
disappears  when  a  rule  applies  to  the  entire  relation,  i.e.,  to  all  object  occurrences,  since 
there  is  no  advantage  to  the  lock.  There  is  another  drawback  to  this  technique.  For  exam- 
ple, if  a  rule  tests  a  non-indexed  attribute  of  a  relation,  then  in  order  to  avoid  insertion 
anomalies,  all  tuples  of  that  relation  will  set  a  lock  corresponding  to  that  rule.  In  this  case, 
the  rule  will  be  processed  for  all  tuples  in  the  relation  irrespective  of  whether  it  applies  or 
not.  Finally,  the  technique  does  not  seem  to  cover  inference  rules  that  derive  new  tuples 
(object  occurrences).  For  all  these  reasons  we  have  not  considered  a  technique  based  on  lock- 
ing, in  our  implementation. 
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7.3    Stnirfiiring  Rnlp.^  Within  Ohjecf,  Typps 

A  basic  feature  in  the  design  of  our  knowledge  representation  model  was  that  the 
natural  structuring  of  knowledge  would  be  directly  supported;  i.e.,  data  and  rules  relevant  to 
data  must  be  stored  together,  within  a  knowledge  base  object  type.  The  resulting  clustering 
supports  an  efficient  implementation  of  the  MME  cycle.  To  support  this  clustering,  rules 
should  be  indexed  by  the  user  object  types  for  which  they  are  defined.  The  object 
occurrences  should  also  be  clustered  by  the  corresponding  user  object  type.     '      -  ,     ;  ,•  ^,  t 

There  can  be  several  rules  defined  for  a  single  user  object  type;  these  rules  interact  with 
each  other  and  are  accessed  during  difi^erent  phases  of  the  MME  cycle  depending  on  their 
classification,  as  seen  in  Chapter  Six.  To  aid  an  efficient  implementation,  rules  defined  for  a 
single  user  object  type  must  be  further  structured,  within  the  object  type,  and  this  structure 
must  reflect  and  exploit  these  differences  on  how  they  are  accessed  and  used.  The  category 
of  a  rule  based  on  its  LHS  construct,  the  relationship  between  rules  in  the  explicit  inference 
chains,  etc.,  are  all  meta-information  that  is  used  to  structure  rules,  within  the  object  types 
for  which  they  are  defined. 

Corresponding  to  the  distinction  in  a  conventional  DBMS  between  the  definition  of  the 
database  (dictionary)  and  the  database  (that  stores  the  actual  data)  is  the  distinction,  in  the 
KBMS  environment,  between  the  definition  of  the  knowledge  base  and  the  knowledge  base 
(the  occurrences  of  the  user  object  types).  The  definition  of  the  knowledge  base  will  contain 
the  definition  of  each  user  object  type.  Rules  defined  for  a  user  object  type  apply  to  all 
occurrences  of  the  user  object  type  and  are  a  part  of  the  definition  of  the  knowledge  base. 
We  will  refer  to  the  definition  of  the  knowledge  base  in  the  KBMS  environment  as  the  "def- 
kb." 

The  knowledge  base  itself  stores  objects  that  are  occurrences  of  each  defined  user  ob- 
ject type  and  will  be  referred  to  as  the  "stokb."  The  KML  operations  and  value  dependent 
rules  of  the  KBMS  transaction  are  executed  against  the  object  occurrences  of  the  sto-kb. 
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Each  object  occurrence  is  a  unique  object  and  has  a  unique  object  identifier,  which  was  re- 
ferred to  as  the  OID,  in  previous  chapters. 

The  storage  structures  of  the  sto-kb  can  use  a  simple  hashing  technique  to  cluster  the 
occurrences  corresponding  to  each  user  object  type.  Each  stored  object  occurrence  also  con- 
tains an  index  value  into  the  def-kb  to  determine  its  corresponding  object  type.  Thus,  given 
a  user  object  type  of  the  def-kb,  the  OID  values  for  the  corresponding  object  occurrences  of 
the  sto-kb  can  be  obtained,  and  given  an  OID  value  of  an  occurrence  in  sto-kb,  the 
corresponding  user  object  type  from  def-kb  can  also  be  determined. 

Rules  that  are  defined  for  a  particular  occurrence  of  a  user  object  type  are  stored  in  the 
sto-kb  as  a  part  of  the  corresponding  object  occurrence,  as  the  value  of  an  attribute  of 
abstract  data  type  RULE. 

The  def-kb  must  be  memory  resident  or  at  least  pre-fetched  from  the  disk;  thus,  it  is 
subject  to  both  time  and  space  constraints,  whereas  the  sto-kb  is  located  on  disk  and  is  sub- 
ject only  to  time  constraints. 

Rules  in  the  def-kb  are  accessed  during  almost  every  phase  of  the  MME  cycle  and  are 
used  difi"erently,  depending  on  their  classification.  For  example,  during  the  match  phase, 
operations  and  value  dependent  rules  in  the  KBMS  transaction  are  matched  against  the 
value  independent  rules  in  the  def-kb.  Value  dependent  rules  stored  in  the  def-kb  are  also 
accessed  during  the  "look-ahead"  process.  Whenever  the  MME  cycle  is  re-invoked,  the 
operations  and  rules  appended  at  higher  execution  levels  are  matched  against  the  value  in- 
dependent rules  in  the  def-kb.  Also  during  execution,  the  operations  executed  against  the 
sto-kb  are  used  to  implicitly  select  value  dependent  rules  from  the  def-kb. 

In  Chapter  Four,  we  distinguished  between  rules,  on  the  basis  of  the  semantics  they 
captured.   From  an  implementation  viewpoint,  we  can  further  categorize  rules  as  follows: 

(1)  value  independent  rules  that  test  the  execution  status  of  operations 

(2)  value  dependent  rules  that  are  explicitly  selected  for  execution 
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(3)  value  dependent  rules  that  are  implicitly  selected  for  execution. 

(4)  value  independent  rules  that  test  the  execution  status  of  value  dependent  rules 

Each  of  these  categories  of  rules  will  be  structured  differently  within  the  object  type 
definitions  of  the  def-kb,  to  reflect  differences  in  usage. 

Let  us  consider  the  first  subset  of  value  independent  rules  that  test  the  execution  status 
of  operations  against  the  user  object  types.  In  the  match  phase  of  the  MME  cycle,  the 
triggering  operations  in  the  transaction  are  simply  matched  against  the  LHS  of  these  rules. 
Matching  uses  both  the  operation  type  and  the  object  type  and  for  a  particular  operation 
against  an  object  type  the  matching  rules  must  be  obtained. 

If  a  user  object  type  has  a  small  number  of  these  rules  defined  so  that  they  all  fit  into 
one  page  or  block;  i.e.,  they  are  smaller  than  the  unit  of  access  corresponding  to  accessing  the 
def-kb,  then,  these  rules  can  be  clustered  based  on  the  object  type,  alone.  However,  if  these 
rules  exceed  a  single  unit  of  access,  then  they  must  be  clustered  using  both  the  object  type  as 
well  as  the  operation  type,  so  as  to  make  the  retrieval  of  these  rules  more  efficient.  They 
need  not  be  mdexed  by  rule  name  since  the  MME  cycle  does  not  access  these  rules  using  the 
rule  name.  When  we  use  the  term  clustered,  we  assume  that  the  system  uses  some  form  of 
hashing  to  cluster;  i.e.,  allocate  contiguous  storage  for  a  group  of  these  rules  that  have  the 
same  value  for  the  operation  type  and  object  type,  tested  on  the  LHS.  This  clustering  favors 
efficient  retrieval  of  the  relevant  rules. 

Next,  we  consider  the  subset  of  value  dependent  rules  defined  for  a  user  object  type 
which  are  explicitly  selected  via  the  inference  chains.  For  efficiency  reasons,  it  is  convenient 
to  cluster  all  the  rules  in  an  explicit  inference  chain  together  since  they  are  accessed  together; 
i.e.,  a  value  dependent  rule  must  be  stored  together  with  the  rules  that  explicitly  select  it  for 
execution.  Rules  in  an  inference  chain  are  often  defined  for  the  same  user  object  type  and 
this  means  they  can  be  clustered  within  the  object  type  for  which  they  are  defined.  For  ex- 
ample, in  Figure  6.3,  rule  loc_stat_l  is  defined  for  the  user  object  type 
TOP^ECRET_PROJ  and  it  is  executed  by  rule  T9  that  is  also  defined  for  the  same  object 
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type.  As  a  result,  both  rules  can  be  clustered  within  this  object  type  and  stored  in  def-kb. 
When  rule  T9  is  matched  and  retrieved,  rule  loc-jstat_l  is  also  accessed.  Rule  loc^tat_l 
need  not  be  indexed  by  rule  name  unless  its  execution  status  is  being  tested  by  some  other 
value  independent  rules,  as  will  be  discussed  later. 

In  contrast,  some  value  dependent  rules  occur  in  more  that  one  inference  chain  and  are 
explicitly  selected  by  several  rules  defined  for  difi-erent  user  object  types.  For  example,  rule 
gen_hier_l,  defined  for  GOVT_PROJECT,  was  explicitly  selected  by  a  value  mdependent 
rule  T7,  defined  for  TOP_5ECRET_PROJ.  The  same  rule  genJiier_l  can  also  be  explicitly 
selected  by  value  independent  rules  T7'  and  T7",  defined  for  the  constituents  of 
GOVT_PROJECT,  i.e.,  MILITARY_PROJ  and  NON^IILITARY_PROJ.  See  Figure  7.2 
for  rules  relevant  to  GOVT_PROJECT  and  its  constituents. 

In  this  case,  there  are  two  alternative  clustering  strategies  for  the  rules  involved.  The 
explicitly  selected  rule  gen_hier_l  can  be  replicated  and  a  copy  stored  with  all  three  object 
types  in  the  def-kb,  clustered  with  the  corresponding  rule  (T7,  T7'  or  T7")  that  executes  it. 
Now,  if  one  of  these  three  rules  are  retrieved,  a  copy  of  gen_hier_l  is  also  accessible;  this 
method  uses  more  space  in  def-kb. 

Alternately,  a  rule  such  as  gen_hier_l  can  be  stored  once  with  the  object  type  for  which 
it  is  defined,  i.e.,  GOVT_PROJECT.  Smce  a  rule  may  be  accessed  by  rule  name  in  several 
explicit  inference  chains,  an  index  into  these  value  dependent  rules  defined  for  an  object  type 
must  be  maintained,  based  on  the  value  of  rule  name.  Then  rules  T7,  T7',  etc.,  will  be 
stored  with  the  difl-erent  object  types  for  which  they  are  defined.  They  would  access  rule 
gen_hier_l  using  the  index  on  rule  name  or  directly  via  pointers.  This  strategy  requires  less 
space  but  needs  two  accesses,  the  second  access  could  be  a  fast,  direct  access  via  pointers. 

Next,  we  consider  the  subset  of  value  dependent  rules  that  are  implicitly  selected  for 
execution.  They  too  are  stored  in  def-kb  and  must  be  clustered  for  efficient  selection.  The 
implicit  selection  process  does  not  use  value  independent  rules  with  specific  triggering  infor- 
mation; thus,  the  chances  of  false  firing  with  implicit  selection  are  much  higher  than  with 
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explicit  selection.  Implicit  selection  is  an  example  of  a  forward  inference  chain;  i.e.,  a  rule  is 
implicitly  selected  on  the  basis  of  the  condition  on  its  LHS.  The  LHS  is  a  value  dependent 
condition  involving  one  or  more  object  occurrences  of  user  object  types  and  a  rule  is  selected 
on  the  basis  of  user  object  types  and  occurrences. 

The  subset  of  value  dependent  rules  that  are  to  be  implicitly  selected  will  be  clustered 
in  the  def-kb  using  the  object  types  mentioned  on  the  LHS  of  the  rules.  Given  any  user  ob- 
ject type,  all  the  rules  in  this  subset  which  test  occurrences  of  that  particular  user  object 
type  can  be  obtained.  During  execution,  in  the  MME  cycle,  the  operations  and  rules  of  a 
KBMS  transaction  are  executed  against  the  object  occurrences  in  sto-kb.  The  storage  mani- 
pulation operations  (INSERT,  DELETE,  UPDATE,  DERIVE,  etc.)  in  the  transaction  are 
the  seed  or  starter  used  for  implicit  selection. 

When  an  object  occurrence  of  the  stokb  is  affected  by  a  storage  manipulation  opera- 
tion, its  OID  value  is  saved  and  used  later  to  determine  the  corresponding  user  object  type. 
This  is  done  using  the  index  into  the  def-kb  stored  with  affected  object  occurrences  in  stokb. 
For  efficiency  considerations  and  to  maintain  transaction  orientation,  this  is  done  once  after 
all  the  operations  of  the  KBMS  transaction  are  executed  and  the  changes  are  committed. 
First,  a  set  of  affected  OID  values  are  obtained.  This  set  of  affected  user  object  types  is  used 
to  access  the  def-kb  and  access  all  the  value  dependent  rules  that  are  indexed  using  these  ob- 
ject types  and  may  be  applicable.  Since  we  aggregate  all  the  OID  values,  a  rule  will  be 
selected  only  once,  for  each  value  of  object  type. 

Consider  for  example  the  value  dependent  rules  in  Figure  7.3.  Assume  that  these  rules 
are  candidates  for  implicit  selection;  i.e.,  they  do  not  occur  in  any  explicit  inference  chains.  • 
The  rule  inter_xul_l,  defined  for  the  user  object  type  der_WIv^T_TASK,  tests  a  condition 
involving  two  object  types  PROD_JOB  and  WORKSTATION.  One  alternative  is  to  store 
two  copies  of  this  rule,  clustered  with  the  definition  of  these  two  object  types.  Now,  the  rule 
can  be  obtained  directly  using  either  object  type.  The  other  alternative  is  to  store  a  single 
copy  of  this  rule,  clustered  with  the  object  type  der_WKST_TASK  and  build  an  index  into 
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these  rules  using  the  rule  name.  The  index  value,  i.e.,  the  rule  name,  will  be  stored  with  the 
user  object  types  WORKSTATION  and  PROD_JOB,  in  def-kb.  One  advantage  of  the 
second  alternative  is  that  it  is  easier  to  detect  when  the  same  rule  is  selected  simultaneously 
by  several  seed  operations,  e.g.,  INSERT  operations  into  both  WORK-STATION  and 
PROD_JOB. 

A  more  interesting  case  involves  the  implicit  selection  and  execution  of  the  transitive 
closure  rules  tran^Ll  and  tran^L2,  of  Figure  7.3,  that  are  defined  for  the  user  object 
type  der_P*S_P.  The  two  rules  can  be  clustered  with  the  der_P*S_P  object  type  and  in- 
dexed using  rule  name.  The  appropriate  index  values  will  also  be  stored  with  the  definition 
of  the  P*S_P  object  type.  When  an  occurrence  is  inserted  into  the  object  type  P*S_P,  the 
two  rules  will  be  implicitly  selected  and  executed.  If  the  LHS  of  either  rule  (or  both)  is 
satisfied,  then  der_P*S_P  occurrence(s)  will  be  derived.  If  occurrences  of  der_P*S_P  are 
derived,  then  this  will  implicitly  select  rule  trans_x;L2;  this  process  continues  until  no  new  oc- 
currences of  der_P*S_P  are  derived.  This  is  a  bottom-up  method  to  compute  the  transitive 
closure  and  it  is  inefficient  since  it  is  not  goal-oriented,  and  always  computes  the  complete 
transitive  closure. 

Recall  the  example  transaction  fragment  of  Chapter  Six  where  the  two  rules 
gen_constr_2  and  loc_stat_l  were  implicitly  selected.  The  rules  are  described  in  Figure  7.3. 
The  two  rules  will  be  clustered  with  the  object  type  GOVT_PROJECT,  for  which  they  are 
defined  and  indexed  using  rule  name.  The  values  of  the  index  will  be  stored  with  the 
definition  of  the  object  type  TOP_5ECRET_PROJ.  The  fragment  executes  INSERT  opera- 
tions against  both  TOPSECRET_PROJ  and  GOVT_PROJECT.  When  an  occurrence  is 
inserted  into  TOPSECRET_PROJ,  its  OID  value  will  be  used  to  determine  its  object  type. 
Using  the  index  values  for  rule  name,  in  the  definition  of  TOPSECRET_PROJ,  the  two 
rules  will  be  implicitly  selected.  INSERT  into  GOVT_PROJECT  will  select  loc_stat_l 
again,  but  it  will  be  executed  just  once. 
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Now,  we  consider  the  subset  of  value  independent  rules  that  test  the  execution  status  of 
value  dependent  rules.  During  the  match  phase  of  the  MME  cycle,  the  names  of  value 
dependent  rules  in  the  KBMS  transaction  are  matched  against  this  subset  of  rules.  These 
value  independent  rules  are  stored  in  def-kb,  clustered  with  the  value  dependent  rules  they 
test.  Consequently,  they  will  be  accessed  together  with  the  corresponding  value  dependent 
rules.  All  value  dependent  rules  whose  execution  status  is  matched  by  at  least  one  value  in- 
dependent rules,  must  be  indexed  by  rule  name,  as  well  to  facilitate  this  process. 

Finally,  we  consider  those  rules  defined  for  particular  object  occurrences  and  stored  in 
stokb  with  the  occurrences.  They  are  explicitly  selected  for  execution  and  they  are  specified 
in  the  inference  chain,  using  the  OID  value  of  the  object  occurrence  (or  some  selection  cri- 
teria on  the  other  attributes  of  the  object  occurrence)  and  the  attribute  name  which  stores 
the  rule.  Figure  7.4  has  an  example  of  an  attribute  ORDER_RULE  whose  abstract  data 
type  is  RULE,  defined  for  the  object  PART_DEF.  A  rule  military^onstr_l  selects  appropri- 
ate object  occurrences  of  PART_DEF  and  explicitly  executes  the  rules  stored  as  the  value  of 
ORDER_RULE,  for  these  occurrences.  Figure  7.4  describes  an  example  value  of 
ORDER_RULE. 

During  execution  of  the  KBMS  transaction,  these  rules  can  be  retrieved  from  the  stokb 
and  directly  executed.  Due  to  the  restrictions  placed  on  these  rules,  they  do  not  have  to  pass 
through  the  MME  process  to  match  against  any  value  independent  rules  and  the  scope  of  the 
LHS  condition  and  RHS  operation  of  these  rules  is  restricted  to  the  particular  object.  We 
impose  these  restrictions  because  the  rule  is  stored  in  stokb  and  is  accessed  while  the  KBMS 
transaction  is  being  executed.  If  this  rule  were  to  be  matched  against  other  value  indepen- 
dent rules  or  were  to  modify  the  transaction,  it  would  require  access  to  the  rules  in  the  def- 
kb  and  re-invoking  the  MME  cycle.  The  cost  of  doing  this  for  each  rule  defined  for  particu- 
lar object  occurrences  is  prohibitive,  hence,  the  restrictions. 
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7.4    Identifying  the  Scope  of  a.  Rule 

The  value  dependent  rules  are  executed  against  the  object  occurrences  stored  in  the 
sto-kb.  The  LHS  of  a  value  dependent  rule  accesses  the  sto-kb  and  retrieves  object  oc- 
currences that  satisfy  its  LHS.  These  object  occurrences  may  be  directly  described  or  named 
occurrences,  and  this  usually  implies  that  they  will  be  manipulated  further,  in  the  KBMS 
transaction.  On  the  other  hand  these  occurrences  may  be  used  to  verify  a  condition  which 
returns  a  boolean  value,  in  which  case  it  is  their  existence  that  is  used  rather  than  the  actual 
occurrences. 

The  triggering  KML  operations  (or  rules)  that  explicitly  select  value  dependent  rules  or 
the  seed  KML  operations  that  implicitly  select  value  dependent  rules,  specify  the  context  for 
this  retrieval  operation.  Identifying  the  context  for  retrieval  is  essential  if  the  KBMS  is  to  be 
efficiently  implemented.  In  the  case  of  explicit  selection,  parameter  passing  is  used  to  bind 
variables  in  the  explicitly  selected  rule.  Both  forward  and  backward  inference  chains  are 
supported,  depending  on  the  direction  of  variable  binding  as  will  be  seen  shortly.  In  the  case 
of  implicitly  selected  value  dependent  rules,  the  seed  operations  specify  the  context  for  re- 
trieving object  occurrences  to  satisfy  the  LHS  conditions.  Here  the  binding  is  implicit  and 
only  supports  forward  inference  chains. 

Let  us  consider  how  the  triggering  operations  (explicit  selection)  or  the  seed  operations 
(implicit  selection)  specify  the  context  for  the  retrieval  operations.  For  example,  if  the  opera- 
tion inserts  (or  derives)  object  occurrences  into  the  sto-kb  ,  then  the  operation  provides  a 
complete  description  of  the  inserted  (derived)  object  occurrences.  These  inserted  (derived) 
occurrences  are  used  to  bind  variables  in  the  selected  rule.  In  Figure  6.3,  rule  T5,  triggered 
by  inserting  an  occurrence  X  into  TOP_SECRET_PROJ,  will  select  rule  gen^onstr_l  and 
provide  the  scope  for  the  X  occurrence  and  some  restrictions  for  the  Y  occurrence. 

When  the  operation  modifies,  deletes  or  retrieves  object  occurrences,  then  the  operation 
provides  a  partial  description  of  the  relevant  occurrences.   Value  dependent  rules  selected  by 
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the  operation  will  use  this  partial  description  when  satisfying  their  LHS.  For  example,  in 
Figure  6.5,  a  retrieval  operation  against  der_P*S_P  may  provide  bindings  for  the  attribute 
PART.  It  will  match  with  rules  T4  and  T4'  and  will  provide  bindings  for  the  X  occurrence. 
In  the  explicit  inference  chain  that  follows,  parameter  passing  will  use  this  description  to 
bind  the  Z  occurrence  on  the  RHS  of  rules  trans^Ll  and  trans^L2  and  through  this  the  Y 
and  P  occurrences  of  P*S_P  and  der_P*S_P,  respectively,  on  the  LHS  of  these  rules.  Thus, 
retrievals  to  satisfy  the  LHS  of  these  two  rules  will  use  this  qualification  on  the  attribute 
PART.  In  this  example,  the  presence  of  a  backward  chain  of  inference  resulted  in  variable 
binding  from  the  RHS  of  the  selected  rule  (goal  side)  to  its  LHS. 

Similarly,  a  triggering  rule  can  select  value  dependent  rules  for  execution.  The  scope  of 
the  object  occurrences  will  include  all  the  occurrences  retrieved  to  satisfy  the  LHS  of  the 
triggering  rule  and  the  occurrences  affected  by  the  RHS  consequent  of  the  triggering  rule. 

In  implicit  selection,  operations  are  used  as  a  seed  for  selecting  rules.  Operations  are 
executed  against  the  sto-kb,  and  the  OID  values  of  the  affected  object  occurrences  are  used  in 
two  ways.  First,  using  these  OID  values,  the  corresponding  user  object  types  are  determined 
and  from  this  the  relevant  value  dependent  rules  indexed  using  these  user  object  types  are 
selected.  Secondly,  the  OID  values  and  the  corresponding  object  occurrences  of  the  sto-kb 
are  used  to  bind  variables  in  these  implicitly  selected  rules,  as  described  previously  for  the  ex- 
plicitly selected  rules.  This  provides  the  context  for  further  retrievals  to  satisfy  the  LHS  of 
these  rules. 

The  complexity  of  the  LHS  conditions  of  the  implicitly  selected  rules  determines  the 
complexity  of  the  bindings.  Bindings  are  not  explicitly  specified  in  the  rules  by  parameters 
and  must  be  determined  by  the  KBMS.  Ease  of  binding  must  be  considered  in  designing  the 
language  constructs  of  the  KML  in  which  rules  are  expressed.  This  is  beyond  the  scope  of 
this  dissertation. 
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Ul — Issues  Concerning  t,he  Conriirrent,  Eyenition  of  Rules 

In  this  section,  we  investigate  the  possibility  of  exploiting  concurrency  to  improve 
KBMS  execution  efficiency.  In  the  MME  cycle,  several  value  dependent  rules  were  implicitly 
selected  in  the  execute  phase  and  scheduled  to  execute  simultaneously.  By  isolating  each  of 
these  value  dependent  rules,  together  with  its  own  retrieval  and  manipulation  operations, 
into  a  KBMS  transaction,  we  can  generate  a  set  of  concurrent  KBMS  transactions.  Conse- 
quently, there  may  be  opportunities  to  interleave  the  execution  of  these  transactions,  thereby 
improving  execution  efficiency  of  the  MME  cycle. 

We  first  identify  sources  of  parallelism  in  the  MME  cycle  that  are  appropriate  for  iso- 
lating parallel  transactions.  Next,  we  discuss  the  serializability  criterion  for  the  correctness 
of  a  set  of  concurrently  executing  DBMS  transactions.  We  then  apply  this  serializability  cri- 
terion to  a  set  of  concurrently  executing  transactions  in  the  KBMS.  We  show  the 
equivalence  between  the  interleaved  execution  of  a  set  of  concurrent  transactions  and  a  par- 
ticular serial  execution  of  the  same  set  of  transactions.  Finally,  we  consider  a  different  meas- 
ure of  comparison  between  a  serial  execution  and  a  concurrent  execution;  this  measure  esti- 
mates the  number  of  possible  choices  of  execution  schedules. 

L5J — Sources  of  Parallelism  Within  the  MX/fE  Cycle 

Value  dependent  knowledge  rules  are  either  explicitly  or  implicitly  selected  for  execu- 
tion. Both  methods  can  schedule  a  rule  so  that  it  executes  in  parallel  with  other  operations 
and  rules. 

In  Figure  6.6,  rules  trans-xLl  and  trans^l_2  are  explicitly  selected  to  execute  in  paral- 
lel each  time  an  operation  RETRIEVE  from  der_P*S_P  is  executed.  Although  these  rules 
execute  in  parallel,  they  would  benefit  more  from  global  optimization  than  from  interleaved 
execution.  They  are  selected  by  the  same  triggering  operation  and  they  retrieve  object  oc- 
currences of  the  same  object  type;   thus,   they   benefit  from  sharing  intermediate  results 
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through  optimization.  In  addition,  when  the  par-exec  option  is  used,  the  triggering  operation 
or  rule  must  be  synchronized  with  the  parallel  execution  of  the  RHS  consequent  (in  this  case 
the  two  rules  trans_cLl  and  trans.jcL2).  If  rules  are  isolated  into  transactions,  it  would  be 
hard  to  synchronize  their  execution  and  termination. 

In  contrast,  when  the  operations  in  a  KBMS  transaction  are  used  collectively  to  impli- 
citly select  value  dependent  rules,  these  rules  can  also  execute  in  parallel  but  they  are  not 
constrained  to  synchronize  their  execution.  Since  they  are  selected  by  different  operations 
they  are  often  dissimilar  and  may  not  all  access  the  same  object  occurrences.  This  results  in 
less  contention,  but  also  in  little  benefit  from  sharing  results.  These  implicitly  selected  rules 
are  therefore  good  candidates  for  interleaved  execution  rather  than  global  optimization. 

In  Figure  6.4,  two  rules  gen^onstr_2  and  Ioc_stat_l  are  implicitly  selected  by  the 
operation  INSERT  into  TOP^ECRET_PROJ.  The  rule  gen^onstr^  retrieves  object  oc- 
currences from  MILITARY_PROJ  whereas  loc_stat_l  retrieves  occurrences  from 
GOVT_PROJECT.  By  isolating  the  operations  executed  by  these  two  rules  into  indepen- 
dent transactions,  their  execution  can  be  interleaved.  In  a  typical  MME  cycle,  a  number  of 
operations  in  a  single  KBMS  transaction  will  collectively  select  several  such  value  dependent 
rules. 

The  implicit  selection  and  execution  of  these  value  dependent  rules  is  an  example  of  a 
forward  chain  of  inference  and  resembles  the  selection  and  execution  of  productions  in  an 
OPS-style  production  system  (PS)  environment.  Consequently,  the  benefits  that  accrue  from 
concurrent  execution  of  value  dependent  rules  could  potentially  apply  to  production  systems, 
as  well. 

7.5.2     The  Serializahility  Criterion  for  Concurrent  DBMS  Transactions 

Extensive  research  in  the  concurrent  execution  of  DBMS  transactions  and  various  pro- 
tocols and  algorithms  for  concurrency  control  have  been  summarized  in  CER84  and  ULL84. 
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Consider  a  set  of  DBMS  transactions,  executing  in  the  DBMS.  If  each  transaction  were  to 
maintain  database  consistency,  then  the  serial  execution  of  these  transactions  also  maintains 
consistency.  The  actual  final  state  would  depend  on  the  particular  serial  execution  order  of 
these  transactions. 

Each  transaction  is  represented  by  a  sequence  of  elementary  operations  and  we  can 
represent  a  transaction,  Tj,  in  the  simplest  case,  by  a  read  operation  R.  followed  by  a  write 
operation  Wj.  The  data  items  (or  object  occurrences  in  an  object  oriented  DBMS)  that  are 
i.  '■  read  and  written  by  each  transaction  are  its  read-set  and  write-set,  respectively. 

We  can  define  a  log,  L,  for  a  set  of  transactions,  Tj,  to  represent  the  order  in  which  the 
elementary  operations  are  executed.  A  log  is  serial  if  all  the  elementary  operations  in  the 
transaction  are  executed  consecutively,  e.g.,  Rj,  W^,  R^,  Wg,  etc.  corresponding  to  a  serial 
execution  order  of  transactions,  T,,  T^,  etc.  A  log  can  also  be  constructed  to  represent  the 
interleaved  execution  of  the  elementary  operations  of  the  set  of  transactions,  e.g.,  R.,  R„, 
Wj,  R3,  Wj,  etc.  A  log,  L,  representing  the  interleaved  execution  of  a  set  of  transactions,  is 
said  to  be  serializaihle  if  it  is  computationally  equivalent  to  any  particular  serial  log  for  the 
same  set  of  transactions.  Algorithms  for  determining  if  two  logs  are  computationally 
equivalent  (and  corresponding  proofs)  are  in  BER79,  CER84,  ESW76  and  ULL84.  The  seri- 
alizability  criterion  of  correctness  states  that  the  interleaved  execution  of  a  set  of  transac- 
tions maintains  consistency  of  the  database  if  the  log  representing  its  execution  is  serializable. 

In  general,  the  problem  of  determining  if  a  given  log  is  serializable  is  NP-hard.  In  prac- 
tice, various  concurrency  control  mechanisms  or  protocols  have  been  devised  to  solve  a 
different  task;  this  task  is  to  ensure  that  only  serializable  logs  are  produced  if  the  protocol  is 
followed  by  a  set  of  concurrently  executing  transactions.  Of  course,  these  protocols  are  not 
complete;  i.e.,  they  do  not  generate  all  possible  serializable  logs  for  a  given  set  of  transactions 
and  as  a  result,  the  concurrency  they  allow  is  limited. 
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Of  these  protocols,  the  most  simple  and  popular  is  the  two  phase  lock  or  2PL  protocol. 
This  protocol  requires  that  a  transaction  obtain  a  "lock"  on  an  object  occurrence  before  it 
attempts  to  read  or  write  it.  Depending  on  the  sophistication  of  the  DBMS,  read  locks  (R- 
locks)  and  write  locks  (W-locks)  can  be  differentiated  and  the  granularity  of  the  locks  can 
vary.  For  example,  in  an  object  oriented  environment,  locking  granularity  may  range 
through  all  occurrences  of  an  object  type,  a  single  object  occurrence,  an  object  occurrence 
and  its  component  occurrences,  etc.  From  the  viewpoint  of  the  2PL  protocol  the  transaction 
must  comprise  two  phases.  The  first  is  a  growing  phase  where  all  the  locks  are  obtained  and 
the  second  is  a  shrinking  phase  where  the  locks  are  released.  If  a  transaction  is  2PL  it  is  not 
allowed  to  obtain  any  new  locks  once  a  lock  has  already  been  released.  It  can  easily  be 
shown  that  a  2PL  log;  i.e.,  a  log  produced  by  the  interleaved  execution  of  a  set  of  2PL  tran- 
sactions is  always  serializable  [BER79,  CER84,  ESW76  and  ULL84].  However,  the  2PL  pro- 
tocol is  not  complete;  i.e.,  it  cannot  guarantee  that  it  will  generate  all  possible  serializable 
logs. 

A  concurrency  control  mechanism  that  maintains  the  2PL  protocol  will  grant  any 
number  of  R-locks  for  the  same  object  occurrences,  simultaneously,  to  the  transactions.  If  a 
R-lock  is  held  on  some  object  occurrences,  then  no  W-lock  can  be  obtained  on  those  oc- 
currences until  all  R-locks  on  that  item  are  released.  Once  a  W-lock  is  obtained,  no  further 
R-locks  or  W-locks  will  be  released  on  that  occurrence  until  the  W-lock  is  released.  In 
essence,  the  2PL  protocol  maintains  serializability  by  delaying  the  execution  of  transactions 
when  there  are  conflicting  requests  for  R-locks  and  W-locks.  In  the  worst  case,  the  2PL  log 
will  be  a  serial  log.  We  will  make  use  of  the  2PL  protocol  in  our  discussion  on  applying  the 
serializability  criterion  to  knowledge  rules. 
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7.5.3     Applying  the  Serializahility  Criterion  to  KRMS  Transartinns 


We  now  examine  how  the  serializability  criterion  can  be  applied  in  the  context  of  a 
KBMS  to  maintain  the  correctness  of  a  knowledge  base  during  the  concurrent  execution  of  a 
set  of  value  dependent  rules.  We  first  place  some  constraints  on  the  system  so  that  a 
transaction,  corresponding  to  a  rule,  resembles  a  simple  DBMS  transaction,  i.e.,  it  remains 
static  during  execution;  this  makes  the  description  simpler.  We  can  relax  these  constraints 
to  show  that  the  techniques  apply  without  these  restrictions  [RAS87]. 

The  two  constraints  are  as  follows: 

(1)  The  RHS  action  of  a  rule  cannot  execute  another  rule;  i.e,  no  explicit  inference  chains 
can  be  specified. 

(2)  Value  independent  rules  cannot  match  with  the  operations  executed  by  the  value 
dependent  rule  to  modify  the  transaction;  i.e.,  the  operations  in  the  transaction  do  not 
have  to  go  through  the  match  and  modify  phases  of  the  MME  cycle,  before  execution. 
Given  these  restrictions,  each  transaction  will  comprise  some  Read  operations  (R-locks), 

corresponding  to  the  retrieval  operations  on  the  LHS  of  the  value  dependent  rule,  followed 
by  some  processing  to  verify  the  condition  on  the  LHS,  followed  by  some  write  operations 
(W-locks),  corresponding  to  the  actions  on  the  RHS  of  the  rule.  In  Chapter  Four,  the  RHS 
action  of  the  rules  were  classified  into  three  categories  and  for  the  present  we  disallow  the 
category  corresponding  to  explicit  inference  chains.  The  RHS  of  the  rule  could  now  derive 
new  occurrences  using  the  DERIVE  operation  and  would  require  a  W-lock  of  the  correspond- 
ing object  type(s).  The  RHS  of  the  rule  could  also  execute  some  storage  manipulation  opera- 
tions and  would  require  a  W-lock  of  either  the  object  type  or  occurrences. 

It  is  possible  that  an  implicitly  selected  rule  may  not  actually  be  applicable;  i.e.,  the 
"succ-exec"  flag  will  not  be  set.  If  so,  then,  after  obtaining  the  necessary  R-locks 
(corresponding  to  the  retrieval  operations  on  the  LHS  of  the  rule)  and  determining  that  its 
LHS  condition  is  not  satisfied,  the  transaction  will  not  attempt  to  obtain  any  of  the  W-locks 
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(corresponding  to  the  RHS  of  the  rule)  and  will  release  all  R-locks  previously  obtained.  This 
is  unlike  a  conventional  DBMS  transaction  which  does  not  execute  transactions  conditionally. 
This  situation  does  not  occur  in  a  PS  that  uses  the  Rete  network,  either,  since  such  a  PS  exe- 
cutes only  those  productions  already  guaranteed  satisfiable. 

Intuitively,  it  seems  that  executing  such  a  transaction  which  does  not  obtain  any  W- 
locks  on  object  occurrences  does  not  change  any  values  in  the  knowledge  base  and  should  not 
affect  the  execution  of  any  other  transactions  in  the  KBMS.  In  reality,  however,  executing 
these  transactions  could  make  an  otherwise  serializable  schedule  to  be  nonserializable 
[RAS87]. 

7.5.3.1    Equivalencp  between  a  serial  PS  and  a  connirrpnt  KRMS 

With  this  scenario  in  mind,  we  compare  a  serial  PS  with  the  concurrent  KBMS.  Given 
an  initial  set  ^^  of  transactions,  each  element  of  which  corresponds  to  an  implicitly  selected 
value  dependent  rule,  we  can  compare  the  serial  execution  of  these  transactions  in  a  serial  PS 
environment  with  their  interleaved  execution  in  a  concurrent  KBMS  environment. 

In  a  serial  PS,  in  each  step  i,  a  single  transaction  Tj  is  arbitrarily  selected  from  the  set 
and  applied.  As  a  result  of  applying  Tj,  the  PS  will  (optionally)  determine  that  some  other 
transactions  in  the  set  will  no  longer  be  applicable  and  these  will  be  deleted  from  the  set. 
Let  the  set  of  transactions  deleted  in  step  i  be  Adelj.  Also  as  a  result  of  applying  T-,  the  PS 
will  determine  that  some  additional  transactions  are  now  applicable,  as  well.  Let  the  set  of 
transactions  added  in  step  i  be  Aaddj.   The  new  set  of  candidate  transactions  in  step  (i+1)  is 

*i+i  =  s*i  —  Tj  —  Adelj  +  Aaddj  >.   This  process  will  continue  until  finally  in  step  F,  the  set 

4'p  is  empty.  Note  that  in  the  serial  PS,  each  step  corresponds  to  the  execution  of  a  single 
transaction. 
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The  selection  of  each  Tj  is  arbitrary.  0PS5,  for  example,  makes  this  selection  based  on 
the  recency  of  working  memory  elements  satisfying  a  production  and  the  number  of  conjunc- 
tive conditions  on  the  LHS  of  the  production.  Since  this  selection  is  arbitrary  and  deter- 
mined by  the  particular  PS,  it  is  entirely  possible  that  in  step  2,  T^  is  selected  from  the  set 

1*1  -  Tj  -  Adel  J  which  is  the  set  |*2  "  Aaddj  \.  In  other  words  T2  could  also  be  selected 
from  the  initial  set  ^j  and  not  from  the  added  set  of  transactions  Aaddj.  Similarly,  in  subse- 
quent steps  i,  Tj  can  be  selected  from  the  set  |*i  -  E  (Tj  -  AdeljH  which  is  the  same  as  the 

*•  j-l  ^ 

setk-X;(Aadd.)| 

If  the  selection  is  as  described,  then,  eventually  after  some  f^  steps,  the  serial  PS  will 
have  executed  a  sequence  of  fj  transactions  T^,  Tj,  .  .  .  ,  Tf  where  each  T;  happens  to  be  an 
element  of  the  initial  set  *j.  After  step  fj,  all  transactions  in  *j  are  either  executed  or  delet- 
ed and  the  set  of  applicable  transactions  for  step  (f^  -M),  >I'f  _^  j  is  the  set  ]5]Aadd.[,  i.e.,  all 

the  transactions  added  in  the  fj  previous  steps,  which  were  not  selected  previously.    In  step 
(fj  +  1),  Tf  ^  J  is  chosen  from  this  set  "^^  _^_  y 

We  have  gone  through  this  exercise  with  the  serial  PS  to  compare  it  with  a  concurrent 
KBMS.  Given  an  initial  set  of  transactions,  ^j,  the  serial  PS  can  arbitrarily  select  and  exe- 
cute an  initial  sequence  of  transactions  all  of  which  happen  to  be  in  ^^  Given  this  same  ini- 
tial set  *i,  a  concurrent  KBMS  would  interleave  the  execution  of  this  set  of  transactions.  If 
an  appropriate  protocol  is  used,  such  as  the  2PL  protocol  described  earlier,  and  the  resulting 
schedule  is  serializable,  then  it  must  be  equivalent  to  some  serial  schedule  Tj,  T^,  .  .  .  ,  etc., 
where  each  T;  must  be  from  the  initial  set  ^'j. 
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In  other  words,  the  concurrent  KBMS  will  execute  an  equivalent  serial  schedule  which 
may  be  the  same  as  the  serial  schedule  arbitrarily  selected  by  the  serial  PS. 

After  the  interleaved  execution  of  the  set  *j  completes,  if  the  equivalent  serial  schedule 
of  the  concurrent  KBMS  is  the  same  as  the  serial  schedule  of  the  serial  PS,  then  a  sequence 
of  fj  transactions  would  have  been  executed.  The  KBMS  will  now  collectively  use  the  opera- 
tions executed  by  these  fj  transactions  to  implicitly  select  a  second  "initial"  set  of  transac- 
tions, i.e.,  value  dependent  rules.    This  second  initial  set  will  be  identical  to  the  set  'I', 

ij  + 1 

which  was  available  to  the  serial  PS  in  step  (fj  +  1).  Thus,  the  execution  of  the  serial  PS 
and  the  concurrent  KBMS  are  found  to  be  equivalent. 

The  reason  for  exploring  the  possibility  of  concurrency  within  the  MME  cycle  was  to 
find  ways  to  improve  execution  efficiency.  The  number  of  steps  taken  for  executing  a  set  of 
transactions  is  the  measure  that  has  the  greatest  impact  on  execution  efficiency.  Each  step 
in  the  serial  PS  corresponded  to  the  sequential  execution  of  all  the  operations  within  a  tran- 
saction, and  f  transactions  will  take  f  steps  to  execute.  By  interleaving  the  execution  of  the 
transactions,  we  mean  that  the  operations  within  the  transactions  are  interleaved.  As  a 
result,  in  the  concurrent  KBMS,  object  occurrences  can  be  accessed  simultaneously  by 
several  transactions  and  the  interleaved  execution  may  be  more  efficient  than  a  serial  execu- 
tion; f  transactions  may  take  less  that  f  serial  steps.  In  the  worst  case,  the  interleaved  execu- 
tion in  the  KBMS  may  be  of  no  benefit  and  will  take  the  same  number  of  steps  as  the  serial 
PS.  However,  there  is  a  possibility  that  the  interleaved  execution  will  take  fewer  steps  than 
the  serial  PS  and  will  be  more  efficient. 

In  general,  concurrent  execution  has  been  found  to  be  of  significant  benefit;  however,  it 
is  difficult  to  estimate  what  the  actual  benefits  of  the  interleaved  execution  are,  since  this 
depends  on  the  specific  locking  patterns  of  the  transactions,  the  specifics  of  the  2PL  protocol, 
etc.   Such  a  study  is  beyond  the  scope  of  this  dissertation. 
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In  the  next  section,  we  briefly  discuss  a  difl"erent,  but  related,  measure  of  comparison 
for  the  serial  PS  and  the  concurrent  KBMS.  This  measure  is  the  number  of  possible  choices 
of  serial  execution  schedules  that  are  covered  in  both  cases. 

7.5.3.2    Estimating  the  number  of  exemitinn  schedules 

There  is  another  measure  of  comparison  of  the  serial  PS  and  the  concurrent  KBMS 
that  mvolves  an  estimation  of  the  number  of  possible  choices  for  selecting  a  transaction  in 
any  step  and  the  resulting  number  of  different  execution  schedules,  in  each  case.  This 
measure  is  of  interest  if  difi"erent  execution  schedules  provided  difi"erent  final  results  in  the 
knowledge  base. 

The  serial  PS  is  not  constrained  to  execute  in  the  fashion  that  we  just  described,  where 
the  initial  sequence  of  transactions  is  limited  to  be  in  the  initial  set  *j.  In  fact,  in  each  step 
i,  all  the  transactions  in  the  set  'I';,  which  was  previously  defined,  as  follows: 


*i+i  =  k-Ti-Adeli+Aadd.| 

are  candidates  for  execution.   If  we  use 
of  transactions  in  »!';,  then  in  each  step  i 


to  denote  the  size  of  the  set  ^.,  i.e,  the  number 


*. 


is  the  number  of  available  choices  for  selecting 


a  transaction.    We  can  use  this  to  determine  the  number  of  serial  execution  schedules.   Let 
NSf  represent  the  number  of  choices  of  possible  serial  schedules,  corresponding  to  a  sequence 

of  f  transactions,  Tj  ,  Tg  ,  .  .  .  ,  Tf.  Then  NSf  is  given  by  the  following: 

f 

NSf=n 

i-l 


*i 


where  pl'j^j 


-  |AdelJ+  Iae 
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The  value  of  NS^  can  be  estimated  for  different  conditions  as  follows: 
(a)     If  for  each  i,   Ladd;   -  kdelj   =  0  then  NSp  =  pj  =  F!,  where  F  is  the  final  step 
when  all  the  transactions  are  executed. 


^1 


!  where  F  is  as  before. 


(b)  If  for  each  i,  Laddj  I  -  Ldel.  I  <  0  then  NSp  < 

(c)  If  for  each  i,  L^addj   -  Udel;    >  0  then  for  any  value  of  f  >  kj  ,  NS^  >    *j  !  and  in 
this  case,  execution  never  ceases  since  the  set  of  transactions  (and  the  possible  choices) 


never  falls  below 


*i 


and  may  even  increase  monotonically! 

We  now  estimate  the  number  of  execution  schedules  for  the  concurrent  KBMS.  Here 
we  actually  use  the  number  of  equivalent  serial  schedules,  namely  NCp  for  a  sequence  of  f 
transactions,  rather  than  the  number  of  serializable  schedules.  This  is  because  there  is  no 
one  to  one,  onto  mapping  between  serializable  schedules  and  serial  schedules;  several  serializ- 
able schedules  can  be  equivalent  to  a  single  serial  schedule. 

In  the  concurrent  KBMS,  the  initial  sequence  of  transactions  in  the  equivalent  serial 
schedule  must  all  be  in  the  initial  set  *j.  This  is  because  when  execution  is  interleaved,  at 
any  stage,  there  may  be  several  transactions  that  are  partially  executed,  and  the  KBMS  can- 
not easily  determine  when  a  particular  transaction  has  completed  execution,  but  must  wait 
until  the  set  of  transactions  complete  execution.  So,  if  in  the  equivalent  serial  schedule,  Tj  is 
the  first  transaction  to  be  executed,  the  KBMS  still  cannot  determine  exactly  when  it  com- 
pletes execution.  Consequently,  the  KBMS  cannot  determine  what  new  transactions  are  ad- 
ded by  executing  any  transactions  in  *j,  i.e.,  JAadd;^  until  completing  the  interleaved  exe- 
cution of  all  transactions  in  *j.  These  added  transactions  are  not  included  in  the  possible 
choices,  until  the  set  *j  completes  execution,  so  at  each  step,  the  choices  are  reduced  by 
^addj  ,  when  compared  to  the  serial  PS. 
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Another  feature  of  the  KBMS  could  also  affect  the  estimation  of  the  number  of  serial 
execution  schedules.  When  applying  one  transaction  makes  another  no  longer  applicable,  a 
serial  PS  actually  deletes  this  transaction  from  consideration.   However,  the  KBMS  does  not 

explicitly  delete  the  affected  transactions,  i.e.,  the  set  JAdeU  and  they  will  also  be  choices 

for  execution.  They  will  correspond  to  rules  whose  LHS  is  not  satisfied,  as  discussed  previ- 
ously, and  their  RHS  actions  will  not  be  executed.  Although  these  transactions  are  possible 
choices  and  are  actually  selected  and  executed  by  the  KBMS,  we  will  not  count  them  in  our 
estimation  of  possible  choices  for  two  reasons.  First,  under  certain  circumstances,  they  can 
make  a  schedule  nonserializable,  as  discussed  in  RAS87.  They  also  have  no  effect  on  the 
values  of  the  knowledge  base  since  their  RHS  actions  are  not  executed. 


Let 


*j    be  the  number  of  transactions  in  the  initial  set  and  let  fj  of  these  transactions 


be  actually  executed;  i.e.,  the  LHS  of  the  rest  are  not  satisfied.  At  each  step  i,  the  number  of 


choices  for  the  concurrent  KBMS  is 


*. 


■  i  —  L\delj  1.    After  the  fj  transactions  are  exe- 
cuted we  use  the  operations  executed  by  them,  collectively,  to  form  a  new  initial  set  ^^   ,  ,  of 

ij  + 1 

added  transactions.   Let  NCf  be  the  estimated  number  of  possible  equivalent  serial  schedules 
of  f  transactions  in  the  concurrent  KBMS.   It  will  be  defined  as  follows: 

NCr=n(|*i|-i-|Adeli|)xn(|*f,^i|-i-^del,_^.|)x... 


i=0 


where  ^^  ^  ^  =  ^Aadd;  is  a  new  initial  set  of  transactions  made  applicable  after  executing 

the  set  *^ 

and  f  =  f J  +  fg  + 


If  the  value  of  L\del.   is  =  0  for  all  i,  then 


NCf  = 


^  I X  k/      I 
^1  •  ^    fi  +  i  ■ 
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The  comparison  between  these  two  estimates  NSf  and  NC^,  for  the  serial  and  con- 
current execution,  respectively,  may  become  clearer  if  we  display  these  estimates  graphically. 
Figure  7.5  plots  the  number  of  possible  choices  available  for  selecting  a  transaction  at  each 
execution  step  of  selecting  a  single  transaction.  Figure  7.5a  is  for  the  serial  PS  and  Figure 
7.5b  is  for  the  concurrent  KBMS. 

We  have  several  curves  because  we  can  only  estimate  the  shapes  of  the  curves;  we  have 
no  exact  information  about  the  sizes  of  Uadd;  and  Ldel;  [  which  will  determine  the  exact 
nature  of  the  curves. 

As  can  be  seen  from  the  figures,  the  serial  PS  has  a  larger  number  of  available  choices 
because  at  each  step  it  determines  what  new  transactions  are  made  available,  as  a  result  of 
the  current  execution  step.  In  the  concurrent  KBMS,  however,  the  execution  of  a  set  of 
transactions  is  interleaved  and  the  choices  are  limited  to  transactions  currently  in  the  set. 
This  particular  measure  is  only  relevant  if  the  number  of  different  execution  schedules  that 
are  covered  by  the  KBMS  is  critical.  For  example,  in  real  time  control  applications  the 
response  to  some  changes  should  be  immediate,  and  must  not  be  delayed  until  current  tran- 
sactions that  are  already  selected  and  perhaps  partially  executed  finish  execution.  However, 
this  must  be  balanced  against  the  overall  execution  efficiency  of  the  concurrent  KBMS. 

To  summarize,  we  showed  that  the  interleaved  execution  of  a  set  of  transactions,  each 
representing  an  implicitly  selected  value  dependent  rule  in  a  concurrent  KBMS,  is  equivalent 
to  a  serial  execution  schedule  in  a  PS  type  environment.  The  interleaved  execution  may  im- 
prove execution  eflRciency  and  in  the  worst  case  it  will  be  no  worse  than  a  serial  strategy 
(neglecting  any  locking  overhead).  In  contrast,  when  comparing  an  estimate  of  the  number 
of  possible  serial  execution  schedules  in  both  cases,  the  serial  PS  had  a  larger  number  of 
choices  at  each  execution  step,  compared  to  the  concurrent  KBMS.  We  qualify  this  by  not- 
ing that  this  measure  has  no  efi"ect  on  the  execution  efficiency  of  the  KBMS. 
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two  input  merge  node 
(  B  node) 


two  input  merge  node 
(  B   node) 


final   output  to   conflict  set  of 
applicable    productions 


two  input  merge  node 
(  B   node) 


single  input  node 
(  o<  node) 


single  input  node 
(  o(  node) 


single  input  nodes 
(  o<.  nodes) 


Figure  7.1 


Example  of  a  Discrimination  Net  Corresponding 
to  an  OPSo  Production  Rule 
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T7  :  TOP_SECRET_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  TOP_SECRET_PROJ) 
THEN  (EXECUTE  gen-hier_l(X)  :  GOVT_PROJECT) 

T7'  :  MILITARY_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  MLITARY-PROJ) 
THEN  (EXECUTE  gen_Jiier_l(X)  :  GOVT_PROJECT) 

T7"  :  NON_MILITARY_PROJ 

IF  par-exec(INSERT  an  occurrence  X  into  NON_MILITARY_PROJ) 
THEN  (EXECUTE  gen_hier_l(X)  :  GOVT_PROJECT) 

gen_hier_l(X)  :  GOVT_PROJECT 

IF  (for  occurrence  X  inserted  into  a  constituent  of  GOVT_PROJECT 
NOT^ET_MEMBER(X.OID,  GOVT_PROJECT.OID) ) 

THEN  (INSERT  an  occurrence  P  into  GOVTJPROJECT 
where  P.OID=X.OID) 


■■■-f- 


Figure  7.2    Knowledge  Rules  for  GOVT_PROJECT  and  its  Constituents 


126 


inter-ruLl  :  der_WIC^T_TASK 

IF  (there  exist  occurrences  X  and  Y  of  PROD_JOB  and  WORKSTATION 

such  that  SET_MEMBER(X.OPERATION,  Y.WK_ST_OPS.OPERATION) ) 
THEN  (DERIVE  an  occurrence  Z  into  der.WICST-TASK 
where  Z.PROD_JOB.OID  =  X.OID  and  Z.WORK_5TATION  OID  =  Y  OID 
and  Z.OPERATION  =  X.OPERATION 

and  Z.OP-TME  =F.OP_TIME  where  F  is  an  occurrence  in  set 
Y.WI<:ST_OPS  such  that  F. OPERATION  =  X.OPERATION  ) 

trans^Ll  :  derJP*S_P 

IF  (there  exists  an  occurrence  X  of  P*SJP) 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  such  that  Z  =X) 

trans_cL2  :  der_P*S_P 

IF  (there  exist  occurrences  X  and  Y  of  der_P*S_P  and  P*SJP,  respectively,  such  that 
SET^1EMBER(Y.PART,  X.SUB_PARTS) ) 

THEN  (DERIVE  occurrence  Z  of  der_P*S_P  where  Z.PART  =  X  PART 

AND  Z.SUB-PARTS  =  the  union  of  X.SUB_PARTS  and  Y.SUB_PARTS) 


gen^onstr_2  :  GOVT_PROJECT 

IF  (there  exists  an  occurrence  X  of  TOPSECRET_PROJ  such  that 
NOT_SET_MEMBER(X.OID,  MILITARY_PROJ.OID)  ) 

THEN  (INSERT  an  occurrence  Z  into  MILITARY_PROJ  where  Z.OID  =  X.OID  ) 

loc^tat_l  :  GOVT_PROJECT 

IF  (there  exist  occurrences  X  and  Y  of  TOPSECRET_PROJ  and  GOVT_PRO JECT 
respectively,  such  that  EQUAL(X.OID,  Y.OID)  AND  EQUAL(Y.STATUS  "testing") 
AND  NOT_EQUAL(Y.LOCATION,  "Virginia") ) 

THEN  (UPDATE  Y  such  that  Y.LOCATION  =  "Virginia") 


Figure  7.3    Value  Dependent  Knowledge  Rules  for  Implicit  Selection 
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milit^onstr_l  :  PART_DEF 

IF  (for  an  occurrence  X  of  PART_DEF, 

EQUAL  (X.PARTJDESC,  "military  equipment")  ) 
THEN  (EXECUTE  X.ORDER_RULE) 

an  example  occurrence  of  ORDERJRULE  :  PART_DEF 

IF  (GREATER_THAN  (PROD_COST,  1500)  AND 
LESS-THAN  (QTYJN^TOCK,  10)  ) 
THEN  (alert  the  KBMS) 


Figure  7.4    Example  of  an  Attribute  of  Abstract  Data  Type  RULE 
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fj.i 

execution  step  i 


f^.i 


(a)  Serial  PS 


execution  step  i 

(b)  Concurrent  KBMS 


f^+l 


Figure  7.5  Graphical  Estimate  of  the  Number  of  Possible  Choices  for  the  Serial  PS 
and  the  Concurrent  KBMS  a)  Serial  PS  b)Concurrent  KBMS 


CHAPTER  VIII 
A  METHOD  FOR  THE  EFFICIENT  EVALUATION  OF  LINEAR  RECURSIVE  QUERIES 

IN  THE  INTEGRATED  KBMS 

In  this  chapter,  we  examine  benefits  of  the  functional  integration  of  a  DBMS  and  a  rule 
processing  system  within  the  integrated  KBMS.  We  focus  on  query  optimization  techniques 
from  conventional  DBMS  technology  and  see  how  these  techniques  can  be  used  in  the  MME 
cycle  to  optimize  execution  of  multiple  retrieval  queries  in  a  KBMS  transaction. 

A  KBMS  transaction  that  generates  the  transitive  closure  of  a  relation  is  an  example  of 
the  use  of  a  deductive  rule  that  generates  new  information.  Transitive  closure  is  an  example 
of  a  linear  recursive  query  which  cannot  be  evaluated  in  a  conventional  DBMS.  The  rules 
defining  the  closure  are  expressed  using  KML  retrieval  and  manipulation  operations.  Thus, 
applying  these  rules  in  the  MME  cycle,  the  set  of  resolvents  generated  by  a  linear  recursive 
rule  is  equivalent  to  a  KBMS  transaction  that  evaluates  a  set  of  concurrent  KML  retrieval 
operations.  This  can  be  seen  in  the  KBMS  transaction  fragment  of  Figure  6.6,  corresponding 
to  the  transitive  closure  of  the  relation  defined  for  the  der_P*S_P  object  type. 

Recursive  cycles  of  value  dependent  rules  are  identified  during  the  "look-ahead"  process 
that  precedes  the  execute  phase  of  the  MME  cycle.  Optimizing  the  KBMS  transaction  fol- 
lows the  "look-ahead"  process.  Consequently,  the  retrieval  operations  generated  by  the 
recursive  rules  are  candidates  for  optimization.  We  use  query  optimization  techniques, 
developed  for  use  in  a  DBMS,  to  optimize  these  retrievals,  generated  by  the  recursive  rules, 
in  the  KBMS  transaction.  This  leads  to  an  efficient  evaluation  strategy  for  linear  recursive 
queries  and  illustrates  how  DBMS  strategies  can  be  exploited  in  the  MME  cycle. 

In  the  next  section,  we  review  a  variety  of  evaluation  strategies  that  have  been  pro- 
posed to  deal  with  recursive  queries  [HEN84,  HAN86,  NAQ83  and  ULL85]  and  in  section 
8.2,    we    describe    three    known    DBMS    query    optimization    techniques,    namely    query 
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decomposition,  intermediate  result  sharing,  and  pipelining  and  data-flow  based  approaches  to 
query  execution.  In  section  8.3,  we  use  these  DBMS  optimization  techniques  to  develop  a 
strategy  for  the  evaluation  of  a  simple  recursive  rule  defining  the  transitive  closure  of  a  data- 
base relation.  We  identify  operations  that  are  candidates  for  parallel  evaluation  as  well  as 
operations  that  can  benefit  from  result  sharing.  In  section  8.4,  we  generalize  this  strategy 
and  present  an  algorithm  for  evaluating  any  linear  recursive  intension. 

An  analytical  study  in  section  8.5  measures  the  effect  of  pipelining  on  this  evaluation 
strategy.  In  this  section  we  use  the  response  time  for  each  resolvent  and  the  execution  time 
for  a  set  of  resolvents  as  a  performance  measure  to  examine  the  performance  gain  due  to  the 
data-flow  and  pipelined  approach  to  query  processing.  This  analysis  assumes  a  suitable 
multi-processor  environment  with  an  interconnection  network  for  result  sharing.  Section  8.6 
contains  a  brief  summary.   Results  of  this  evaluation  are  also  in  RAS86. 

In  describing  the  evaluation  strategy  and  results,  we  use  an  example  (binary)  relation 
rather  than  an  object  type  defined  using  the  SAM*  association  types.  The  rules  are 
represented  as  Horn  clauses.  Most  of  the  research  in  recursive  queries  and  query  optimiza- 
tion have  used  the  relational  model  and  we  found  it  simpler  to  use  a  relation  to  explain  our 
strategy  and  relate  it  to  the  existing  research.  ■      ' 

8.1    Review  of  Methods  for  Evaluating  Recursive  Queries 

It  is  assumed  that  the  reader  is  familiar  with  the  relationship  between  logic  program- 
ming and  relational  databases  [BR084,  GAL83,  JAR84a,  JAR84b,  REI78a,  REI78bJ  and  the 
resolution  principle  in  theorem  proving  [ROB65).  A  first-order  database  is  a  function-free 
first-order  theory  in  which  the  extensional  database  (EDB)  is  a  set  of  ground  (having  no  vari- 
ables) positive  unit  clauses  corresponding  to  the  data  in  the  relations  of  the  database.  If  we 
consider  a  Horn  database,  then  the  intensional  database  (IDB)  is  a  set  of  Horn  definite 
clauses  with  exactly  one  positive  literal.    Each  clause  of  the  IDB  represents  a  definition  of 
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some  of  the  tuples  named  in  its  positive  literal,  which  could  also  be  an  EDB  predicate.  For 
example, 

P(x,z)  :-  Q(x,y),  R(y,z) 
says  that  the  appropriate  join  (over  y)  of  Q  and  R  is  contained  in  P.   The  set  of  tuples  in  P  is 
the  union  of  all  tuples  provided  by  each  intensional  clause  defining  P  as  well  as  all  EDB 
tuples  if  P  is  an  EDB  predicate  as  well. 

Straightforward  methods  for  query  evaluation  are  insufficient  in  the  presence  of  recur- 
sive definitions.  Recent  research  has  focused  on  this  problem  [CHA81,  HAN86,  HEN84, 
MIN81,  NAQ83,  ULL85  and  WHA87].  In  ULL85,  methods  for  implementing  queries  that  are 
expressed  using  first  order  logic  as  a  collection  of  Horn  clauses  are  reported.  A  rule/goal  tree 
is  built  using  the  rules  (Horn  clauses)  and  goals  (terms).  A  rule/goal  tree  is  equivalent  to  an 
expression  in  relational  algebra  and,  for  a  finite  tree,  a  bottom  up  evaluation  will  build  a  rela- 
tion at  each  node  until  the  root  is  evaluated.  Recursive  rules  result  in  potentially  infinite 
rule/goal  trees.   The  paper  presents  a  limit  of  trees  process  to  evaluate  infinite  trees. 

Significant  contributions  of  [ULL85]  include  the  use  of  capture  rules  which  specify 
under  what  circumstances  a  node  (of  a  rule/goal  graph  constructed  using  the  logical  rules) 
can  be  evaluated  and  provide  an  efficient  implementation  strategy  for  evaluating  these  trees. 
Two  methods  to  terminate  rules  that  involve  recursion  are  given;  one  takes  advantage  of  the 
finiteness  of  the  domains  and  this  is  the  method  we  adopt. 

In  NAQ84,  the  problem  of  deriving  a  set  of  database  retrieval  requests,  which  gives  the 
correct  answers  to  a  query  involving  a  recursive  statement  and  is  guaranteed  to  terminate,  is 
addressed.  In  this  work,  the  clauses  of  the  IDB  are  represented  as  a  connection  graph  (CG) 
[SIC76].  A  recursive  intension  occurs  in  a  CG  as  a  special  form  of  cycle  called  a  potential 
recursive  loop  (PRL).  Only  PRLs  lead  to  database  retrievals  containing  recursive  statements 
and  algorithms  for  detecting  PRLs  are  well  known  [SIC76]. 

Consider  an  example  database  relation  such  as 
Edge(start_node,  end_node)  which  has  a  tuple  for  each  direct  edge  between  two  nodes  in  a 
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graph.  Then,  the  transitive  closure  of  the  relation  Edge  would  be  a  relation,  Reach,  which 
has  a  tuple  for  any  two  nodes  in  the  graph  that  has  a  path  between  them.  The  following 
clauses  will  define  the  transitive  closure,  Reach,  of  the  Edge  relation: 

Reach(xl,yl)   :-     Edge(xl,yl)  (1)  ,      ' 

Reach(xl,zl)   :-     Reach(xl,yl),  Edge(yl,zl)        (2) 

Using  the  CG  and  resolving  around  the  PRL,  a  query  such  as  Reach(?,c),  where  c  is  a 
constant;  i.e.,  a  query  that  retrieves  all  nodes  that  have  a  path  to  node  c,  will  generate  the 
following  resolvents: 

Edge(?,c)  (3) 

Edge(?,yl),  Edge(yl,c)  (4) 

Edge(?,y2),  Edge(y2,yl),  Edge(yl,c)  (5)    etc., 

A  general  algorithm  for  retrieving  answers  from  the  database,  based  on  these  resolvents 
is  presented  in  NAQ84.  The  algorithm  consists  of  an  outer  loop  (corresponding  to  each  resol-  -■  ."• 
vent)  and  two  inner  loops.  Initially,  using  selection  on  the  database  relation  Edge,  values  for 
yl  will  be  pushed  on  a  queue.  Then,  for  each  resolvent,  all  answers  will  be  extracted  in  the 
first  inner  loop  (using  appropriate  join,  selection  and  projection  operations)  and  the 
corresponding  set  of  values  for  the  next  resolvent,  e.g.,  the  values  for  y2,  will  be  queued  in 
the  second  inner  loop. 

The  time  for  serially  evaluating  several  resolvents,  on  a  single  processor  system,  will  be  '      • 

very  long.   What  we  have  studied  is  a  strategy  for  the  parallel  evaluation  of  these  resolvents  - 
which   are   generated   by   the   recursive   intensions,  on   a  multi-processor  system.    Parallel 
evaluation  of  the  resolvents  will  eliminate  the  outer  loop.    In  addition,  identifying  common         '  '  '  '" 
subexpressions  in  these  resolvents  will  allow  intermediate  result  sharing  among  these  parallel 
operations,  thus  simplifying  the  operations  in  the  inner  loops  and  allowing  the  two  inner        '  ■'> 
loops  to  be  executed  simultaneously.   Further,  the  evaluation  strategy  gains  additional  paral-      " 
lelism  by  using  a  pipelining  approach  for  executing  database  operations.  ',     /  /•*! 
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In  HAN86,  the  performance  of  several  algorithms  for  processing  a  recursive  query  are 
compared.  The  process  of  applying  a  recursive  rule  and  generating  longer  resolvents  is  com- 
pared to  a  wavefront;  i.e.,  the  saved  result  of  an  operation  is  used  to  derive  a  new  result. 
The  algorithm  DW  (or  double  wavefront)  is  similar  to  our  strategy  (described  in  section  8.3) 
in  the  manner  it  shares  results  among  resolvents.  However,  they  do  not  treat  the  resolvents 
as  a  set  of  concurrent  queries  nor  do  they  consider  horizontal  and  vertical  concurrency  and 
pipelining  techniques  to  increase  the  degree  of  parallelism  and  to  improve  the  efficiency  of 
execution. 


8.2   The  Impact  of  Querv  Processing  Techn 
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In  the  previous  section,  a  query  of  the  form  Reach(?,c)  generated  a  sequence  of  resol- 
vents that  had  to  be  evaluated  to  provide  answers  to  the  query.  Each  of  these  resolvents  can 
be  considered  a  retrieval  query  against  the  knowledge  base  and  the  set  of  resolvents  can  be 
treated  as  a  set  of  concurrent  queries;  DBMS  query  processing  and  optimization  techniques 
can  then  be  used  to  optimize  the  execution  of  these  concurrent  queries. 

Query  decomposition  is  a  process  of  translating  a  query  into  a  hierarchy  of  primitive 
operations;  the  result  is  a  query  tree  in  which  the  nodes  represent  the  primitive  operations 
[AST76,  ROT80,  ST076,  WONTG).  The  advantage  of  query  decomposition  is  that  it 
identifies  primitive  operations  on  different  branches  of  a  query  tree  that  can  be  executed  in 
parallel,  i.e.,  "horizontal"  concurrency,  thus  increasing  the  degree  of  parallelism.  It  also 
increases  the  probability  of  finding  an  overlap  among  several  query  trees  which  facilitates 
intermediate  result  sharing. 

The  sharing  of  intermediate  results  among  concurrent  queries  and  the  resulting  elimina- 
tion of  redundant  execution  of  operations  have  been  proposed  in  FIN82  and  JAR84a.  In 
BOR84  and  CH085,  it  has  been  shown  that  as  the  degree  of  sharing  among  concurrent 
queries  increases,  the  query  throughput  also  increases.    Most  of  this  research  studies  the 
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effect  of  eliminating  low-level  read  operations  by  sharing  buffer  space.  More  recent  work 
[MIK86  and  SU86]  shows  the  advantage  of  sharing  the  output  of  high-level  operations  such  as 
select,  join,  etc.  ,-, 

Both  query  decomposition  and  intermediate  result  sharing  have  an  impact  on  the 
evaluation  of  the  concurrent  resolvents.  Each  resolvent  is  equivalent  to  a  relational  algebra 
expression  [ULL85];  thus,  the  set  of  concurrent  queries  or  resolvents  can  be  decomposed  into 
a  hierarchy  of  primitive  algebraic  operations  some  of  which  can  be  evaluated  in  parallel.  The 
feature  of  a  recursive  intension  is  that  each  time  the  recursive  clause  is  applied  it  generates  a 
longer  resolvent  which  is  an  extension  of  a  previous  resolvent  [NAQ83  and  HEN84].  Thus, 
there  is  a  potential  for  identifying  common  sub-expressions  and  sharing  intermediate  results 
of  high-level  algebraic  operations  among  the  concurrent  queries.  For  example,  on  examining 
the  resolvents  in  expressions  (3),  (4)  and  (5),  we  see  that  (3)  is  a  sub-expression  of  (4),  (4)  is  a 
sub-expression  of  (5),  etc. 

The  third  technique  is  the  pipelining  and  data-flow  based  processing  approach  proposed 
for  several  database  machines  [BIC81,  BOR80,  HON84,  KIM84a  and  KIM84b].  Using  this 
technique,  each  processor  assigned  to  a  node  in  a  query  tree  transmits  a  block  of  information 
as  soon  as  it  is  produced.  This  is  in  contrast  to  traditional  distributed  systems  that  delay 
output  until  the  operation  assigned  to  the  node  is  completely  executed.  The  main  advantage 
of  this  data-flow  based  approach  is  the  possibility  of  "vertical"  concurrency,  where  an  operar 
tion  at  one  level  that  requires  input  from  an  operation  at  a  previous  level  can  get  its  input 
data  at  an  earlier  instant,  before  the  operation  at  the  previous  level  is  completed.  Pipelining 
has  a  significant  impact  on  the  efficiency  of  concurrent  queries  that  share  intermediate 
results  (such  as  the  resolvents  of  a  recursive  intension),  since  the  processors  can  get  their 
shared  data  earlier,  and  start  execution  sooner. 
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8.3    Our  Method  for  Efficiently  Evaluating;  Linear  Rpriirsive  Qnerjps 

In  this  section,  we  describe  a  strategy  which  applies  the  DBMS  query  processing  tech- 
niques of  section  8.2  to  process  a  set  of  concurrent  queries  corresponding  to  the  sequence  of 
resolvents  generated  by  a  recursive  rule. 

We  first  decompose  each  resolvent  (query)  into  a  hierarchy  of  primitive  algebraic  opera- 
tions that  can  benefit  from  pipelining.  We  identify  possible  parallelism  in  executing  these 
primitive  operations  as  well  as  opportunities  for  intermediate  result  sharing  among  the  resol- 
vents (queries).  Finally,  we  determine  a  termination  condition  to  halt  the  evaluation  of  these 
concurrent  queries. 

Although  it  is  advantageous  to  maximize  both  the  number  of  primitive  operations  being 
executed  in  parallel  as  well  as  intermediate  result  sharing  among  queries,  the  degree  of  paral- 
lelism is  limited  by  the  availability  of  processors  and  the  amount  of  result  sharing  is  limited 
by  the  bandwidth  and  the  structure  of  the  interconnection  network.  As  the  resolvents 
become  longer  (by  the  repeated  application  of  the  recursive  rules),  there  is  increasing  oppor- 
tunity for  parallelism  and  there  are  several  ways  to  decompose  each  resolvent  into  primitive 
operations.  However,  to  maximize  the  opportunity  for  result  sharing,  it  is  desirable  to 
decompose  each  resolvent  so  that  it  can  share  the  greatest  common  sub-expression  from  a 
previously  evaluated  resolvent.  In  addition,  so  as  to  improve  execution  efficiency,  the  decom- 
position must  first  process  operations  on  restricted  relations,  e.g.,  execute  selection  before 
jom.  To  avoid  irregularity  in  the  interconnection  network  and  to  simplify  the  network  struc- 
ture, it  is  desirable  to  limit  the  sharing  of  results  only  between  adjacent  resolvents. 

We  use  the  example  of  the  transitive  closure,  T,  of  a  database  relation  A  with  two 
attributes  of  interest.   T  is  defined  as  follows: 

T(x,y)   :-    A(x,y) 

T(x,z)   :-    T(x,y),  A(y,z) 
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For  convenience,  we  represent  the  database  relation  as  A-ff  where  each  "f"  indicates  an 
attribute  that  is  free  (unbound).  We  use  "b"  to  indicate  a  variable  bound  to  a  constant 
value;  i.e.,  a  selection  based  on  an  attribute  value.  Thus,  A-bf  is  the  result  of  selecting  tuples 
from  the  database  relation  A,  based  on  the  value  of  the  first  attribute. 

Consider  an  example  query  which  is  a  verification  of  the  form  T(a,c),  where  a  and  c  are 
constants.  This  would  correspond  to  finding  all  paths  between  two  given  points  of  a  graph. 
Then,  to  answer  this  query,  a  series  of  database  queries  (resolvents)  Ti-bb,  as  seen  in  Figure 
8.1,  will  be  generated,  where  "i"  identifies  the  depth  of  the  resolvent  and  "b"  indicates  each 
variable  that  is  bound  to  a  constant.  The  terms  attribute  for  relations  and  variable  for 
predicates  are  used  interchangeably. 

Tl-bb  corresponds  to  the  expression  A(a,c)  and  the  corresponding  query  will  be  (A-bb),  ■' 
which  is  a  direct  selection  of  tuples  from  the  relation  A.    The  second  resolvent,  T2-bb,     " 
corresponds  to  the  expression  A(a,  yl),  A(yl,  c)  where  yl  is  unbound  and  will  be  represented 
by 

(A-bfJNA-fb)  y'r 

which  is  identified  as  a  primitive  operation  in  our  evaluation.    This  primitive  operation     ' 
comprises  initial  selections,  A-bf  and  A-fb,  from  the  relation  A  (corresponding  to  binding  a 
variable  in  a  predicate  to  a  constant  value),   a  subsequent  join  (JN)  operation  over  the 
appropriate  attribute,  here  yl,  (corresponding  to  variable  binding  between  clauses)  followed 
by  a  projection  operation  to  produce  answers  corresponding  to  T2-bb  of  the  form  T(a,c). 

The  primitive  operation  we  have  just  described  is  typical  of  the  operations  resulting 
from  the  decomposition  of  resolvents.  If  the  variables  are  not  bound,  then  the  initial  selec- 
tion will  be  omitted. 

Resolvent  T3-bb  will  be  hierarchically  decomposed  into 
( (A-bf  JN  A-ff)  JN  A-fb) 
where  (A-bf  JN  A-fif)  will  be  evaluated  first,  etc. 
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Figure  8.2  shows  the  hierarchical  decomposition  of  the  resolvents  into  primitive  opera- 
tions. Several  of  these  operations  can  be  executed  in  parallel.  The  legend  [m]-j  in  the  figure 
represents  those  primitive  operations  at  level  m  that  can  be  executed  in  parallel.  The  level, 
m,  of  the  operation,  is  determined  by  the  input  requirements.  For  example,  operations  at 
level  1,  represented  by  [l]-j,  do  not  require  input  from  any  other  operation.  However,  opera- 
tions at  level  2,  [2]-j,  are  those  that  require  input  from  a  previous  level,  in  this  case,  from 
operations  at  level  1,  etc.  The  value  of  j  serves  to  distinguish  between  parallel  operations  at 
the  same  level. 

Those  expressions  that  are  common  sub-expressions  are  labelled  "common"  in  the 
figure.  For  example,  resolvents  T3-bb  and  T4-bb  have  the  expression  (A-bf  JN  A-fl")  in  com- 
mon. To  maximize  result  sharing,  each  resolvent  is  decomposed  so  that  it  can  share  the 
maximum  common  sub-expression  from  its  immediate  predecessor;  i.e.,  the  first  resolvent  is 
decomposed  into  its  primitives  and  then  the  next  resolvent  is  decomposed  so  as  to  share  as 
many  sub-expressions  as  possible  from  the  previous  resolvent,  etc.  For  example,  the  resol- 
vent T6-bb  is  decomposed  into 

[l]-3        [2]-3        [31-2        [21-4        [ll-4 
( ( (A-bf  JN  A-ff)  JN  A-ff)  JN  (A-ff  JN  (A-fT  JN  A-fb) ) ) 
rather  than  an  alternative  decomposition  of 

[11-3        [2]-p        [11-5  [31-q        [11-4 

( ( (A-bf  JN  A-ff)  JN  (A-ff  JN  A-ff) )  JN  (A-ff  JN  A-fb) ) 

The  first  decomposition  is  based  on  using  the  largest  sub-expressions,  namely,  the  out- 
put of  operations  [2]-3  and  [2]-4.  The  second  decomposition  does  not  maximize  result  shar- 
ing. It  is  also  less  efficient  as  it  includes  an  operation  [l]-5  which  computes  the  join  of  two 
unrestricted  relations. 

Figure  8.3  illustrates  the  advantages  of  maximizing  result  sharing  and  maintaining  a 
regular  interconnection  structure;  i.e.  it  results  in  an  evaluation  strategy  that  is  regular  with 
respect  to  primitive  operations  and   interconnections.    This  regularity  is  advantageous  if 
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special-purpose  hardware  is  to  be  built  to  execute  recursive  queries.  In  this  figure,  the  boxes 
at  each  level  represents  the  primitive  operations  that  can  be  executed  in  parallel  at  that 
level.  The  level,  m,  does  not  necessarily  correspond  to  the  depth,  i,  of  the  resolvent  Ti  being 
evaluated  at  that  level.  For  example,  at  level  2,  operation  [2j-l  and  [2]-2  produce  output  for 
resolvents  T3-bb  and  T4-bb,  respectively.  All  the  operations  do  not  produce  answers  to  the 
query;  some  operations  evaluate  sub-expressions.  For  example,  operations  [2]-3  and  [2]-4 
evaluate  the  sub-expressions  T3-bf  and  T3-fb,  respectively. 

Next,  we  examine  the  termination  condition,  based  on  the  finiteness  of  domains.  If  the 
relation  A  is  finite,  then  the  transitive  closure  T  is  also  finite.  Referring  to  figure  8.3,  if  at 
any  level,  m,  the  operation  [m]-3  did  not  produce  any  output;  i.e.,  Tj^^+j-bf  was  empty,  then 
the  processing  can  be  terminated  at  that  level  m,  since  answers  cannot  be  produced  at  subse- 
quent levels  (m-l-1).  This  also  holds  for  the  operation  [m]-4  which  evaluates  T^^j-fb.  How- 
ever, in  the  case  of  cyclic  databases  (databases  that  have  cyclic  relations,  e.g.,  {(a,b),  (b,a)}  ), 
determining  the  termination  condition  is  more  complex.  Here,  the  operations  [m]-3  and  [m]-4 
could  produce  output  without  necessarily  producing  new  answers  at  subsequent  levels.  The  ', 
system  must  check  the  output  of  the  operations  [m]-3  and  [m]-4  and  determine  that  there  are 
tuples  produced  in  T^^j-bf  and  T^^j-fb  which  will  indeed  produce  answers.  To  do  so, 
Tm+i-bf  must  be  compared  with  the  set  {A-bf,  T2-bf,  .  .  .  ,  Tm-bf}  to  ensure  that  T^^j-bf 
is  not  a  sub-set  of  this  set.  If  T^^j-bf  is  indeed  a  subset  of  this  set,  then  no  new  answers  will 
be  produced  at  subsequent  levels.   Execution  should  terminate  at  that  level.   The  same  holds 


for  T^^,-fb. 


8.4    An  Algorithm  for  Evaluating  a  T,inpar  Recnrsinn 

In  this  section,  we  present  an  algorithm  for  evaluating  a  linear  recursive  intension  based 
on  the  strategy  described  in  section  8.3.  In  general,  any  linear  recursive  intension  can  be 
expressed  in  the  form: 
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S(.,.,.  ..):-  M(.,.,.  .  .  ),  S(.,.,.  .  .  ),  P(.,.,.  .  .  )    8.1 

S(.,.,.  .  .  )  :-  D(.,,.  .  .  )  8.2 

M,  D  and  P  may  either  be  extensional  predicates  (database  relations)  or  they  can  be  inten- 

sional  predicates  corresponding  to  relational  algebra  expressions  which  must  be  computed. 

For  a  linear  recursion,  S  must  not  be  mutually  recursive  to  M,  D  or  P.    See  [BAN86]  for  a 

definition  of  linear  recursive  rules. 

Assume  the  query  corresponds  to  S-b  where  b  represents  those  variables  of  S  that  are 

initially  bound.  Then,  using  the  rules  8.1  and  8.2,  we  generate  a  series  of  resolvents,  Si-b,  as 

follows: 

Sl-b       D-b 

S2-b      M-bi  JN  D-bi  JN  P-bi 

S3-b       M-bi  JN  M-br  JN  D-br  JN  P-br  JN  P-bi,  etc. 

where  for  M,   D   and  P,   bi  represents  those  variables  that  are  initially  bound  and  br 

represents  those  variables  that  are  bound  after  each  application  of  the  recursive  rule  8.2. 

The  variables  bound  in  br,  after  each  application  of  the  recursive  rule,  cannot  exceed  the 

variables  that  were  previously  bound. 

We  now  present  an  algorithm,  based  on  the  method  described  in  an  earlier  section,  to 
evaluate  the  resolvents  Si-b,  generated  by  a  general  (linear)  recursive  intension.  The  algo- 
rithm exploits  all  the  query  optimization  strategies  that  we  have  discussed.  The  algorithm 
will  identify  primitive  operations  which  will  be  marked  [l]-o,  where  1  is  the  level  of  the  operar 
tion  and  o  is  the  operation  number.  An  operation  will  be  executed  at  level  1  if  all  of  its  input 
are  provided  by  operations  at  levels  (1-1)  or  lower.  All  operations  at  a  level  can  be  processed 
in  parallel. 

We  use  a  variable  last_used_leveLjop,  represented  by  a  pair  (l,o),  where  1  is  the  level 
and  o  is  the  operation  number.  This  pair  aids  in  numbering  the  parallel  operations  at  each 
level  and  is  initialized  to  0  for  each  level. 


140 


INIT    This  is  the  initialization  step  and  it  evaluates  Sl-b,  corresponding  to  D-b,  where  D  is 
a  database  relation  or  a  relational  algebra  expression. 

Evaluating  D-b  is  defined  to  be  a  primitive  operation.   This  operation  is  marked  [l]-l. 
The  value  of  last_usedJevel-jop  for  the  level  1  must  be  updated  from  (1,0)  to  (1,1). 

LOOP  This  step  is  executed  for  each  subsequent  resolvent  Si-b,  where  i  >  1.   For  each  Si-b, 
starting  from  both  ends  and  proceeding  inwards,  group  literals  together  for  a  pairwise 
join  (JN),  nesting  the  joins  by  using  the  results  of  previous  groupings.  For  example: 
(...((  (A  JN  B)  JN  C)  JN  D)  ...  ) 
(.  .  .  (D'  JN  (C  JN  (B'  JN  A')  )  )  .  .  .  )   ,etc. 

If  there  are  an  odd  number  of  literals  in  the  expression,  then  the  grouping  from  the 
left  gets  precedence.  For  example: 
(((AJNB)JNC)  JN    (DJNE)) 

Thus,  for  the  resolvent  S3-b,  we  group  the  literals  M-bi,  M-br,  etc.,  to  obtain  the  fol- 
lowing: 
( ( (M-bi  JN  M-br)  JN  D-br)  JN  (M-br  JN  M-bi) ) 

Each  of  these  pairwise  joins  comprises  a  primitive  operation.  After  the  grouping  has 
been  completed  for  Si-b,  then,  we  must  mark  each  of  these  primitive  operations  with 
a  level  and  operation  number.  Starting  with  the  most  deeply  nested  pair  and 
proceeding  outward,  alternating  from  left  to  right,  for  each  pair  in  Si-b,  test  the  fol- 
lowing: 

(a)  If  this  join  has  been  previously  computed  by  a  resolvent,  Sj-b,  where  j  <  i,  then  mark 
this  operation  common. 

(b)  If  this  join  has  not  been  computed  then  test  the  following  cases: 

(1)        If  the  two  operands  (literals)  used  in  the  join  have  not  been  previously  computed  by 
Sj-b,  where  j  <  i,  then  mark  this  primitive  operation  [l]-j+l,  where  the  value  of  j  is 
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obtained  from  the  value  pair  of  last_usedJeveLjop  (l,j).    Increment  this  pair  to  (1, 
J+1). 

(2)  If  one  of  the  operands  has  been  previously  computed  by  a  primitive  operation  at  level 
k,  then,  this  operation  must  execute  at  level  k+1.  Thus,  the  operation  will  be  marked 
[k+l]-j+l,  where  the  value  of  j  is  obtained  from  the  value  pair  of  last-usedJeveLop 
(k+1,  j).  Increment  this  pair  to  (k+1,  j+1). 

(3)  If  both  operands  in  this  operation  have  been  previously  computed  at  levels  kl  and  k2, 
respectively,  then  this  operation  will  be  marked  [km+l]-j+l,  where  km  is  the  max- 
imum of  (kl,  k2),  and  the  value  of  j  is  obtained  from  the  value  pair  of 
last-used JeveLjop  (km+1,  j).   Increment  this  pair  to  (km+1,  j+1). 

Figure  8.4  shows  the  markings  of  the  primitive  operations  that  compute  Si-b  and  Fig- 
ure 8.5  shows  the  architecture  to  evaluate  Si-b.  The  upper  bound  for  the  value  of  i,  in  Si-b, 
can  only  be  determined  by  the  termination  condition,  described  earlier,  which  depends  on  the 
finiteness  of  data.  With  a  limited  number  of  available  processors  one  may  either  fix  break- 
points for  i,  e.g.,  i',  i",  etc.,  and  compute  {Si,  S2,  .  .  .  ,  Si'-l},  then  {Si',  Si'+l,  .  .  .  ,  Si"-l}, 
etc. 

Alternately,  one  can  assign  processors  to  the  parallel  primitive  operations  at  each  level 
until  there  are  no  more  available  processors.  The  evaluation  results  of  the  next  section  can  be 
easily  modified  for  either  of  these  situations. 

8.5    Performanre  Rvahiation 

In  this  section,  we  measure  the  performance  of  this  strategy  for  evaluating  recursive 
queries  in  the  integrated  KBMS.  Specifically,  we  compare  the  performance  of  the  transitive 
closure  algorithm  (with  resolvents  Ti-bb)  and  a  linear  recursive  algorithm  (with  resolvents 
Si-b),  with  and  without  pipelining. 
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We  compare  the  "distributed"  approach  which  uses  only  horizontal  concurrency,  with 
the  data-flow  and  pipelining  based  approach  which  uses  both  horizontal  and  vertical  con- 
currency [MIK86  and  SU86].  In  both  cases,  we  assume  that  parallel  execution  of  primitive 
operations  by  multiple  processors  and  intermediate  result  sharing  are  exploited.  With  the 
pipelined  approach,  a  block  of  data  is  transmitted  as  soon  as  it  is  produced.  A  block  is  the 
operand  granularity  for  input  and  output  (of  results).  Processing  at  level  (m+1)  commences 
as  soon  as  operations  at  this  level  have  a  block  of  data  at  their  input  nodes.  This  results  in 
vertical  concurrency  across  several  levels.  With  respect  to  recursive  intensions,  this  implies 
that  several  resolvents  Ti-bb  (Si-b),  will  be  evaluated  in  parallel. 

The  two  performance  measures  used  in  this  study  are  the  response  time  (or  the  time  to 
produce  the  first  block  of  data)  and  the  execution  time  (or  the  time  to  complete  processing 
an  operation).  We  measure  the  response  time  of  each  resolvent  or  query,  Ti-bb  (Si-b),  and 
the  execution  time  for  the  set  of  concurrent  queries,  for  some  depth,  i,  of  these  resolvents. 
We  expect  that  the  pipelined  approach  will  have  better  response  time  and  execution  time 
smce  each  level  will  commence  execution  at  a  much  earlier  instant,  as  compared  to  the  distri- 
buted approach. 

For  our  evaluation,  we  use  the  simple  hash  join  to  model  a  primitive  operation.  The 
analysis  of  main  memory  resident  database  systems  in  DEW84  suggests  that  hash  based 
query  processing  strategies  are  advantageous.  The  same  result  is  reported  in  MIK86  and 
SU86  for  the  data  flow  and  pipelined  approach.  We  assume  that  the  hash  tables  fit  into  main 
memory.  The  multipass  extension  of  the  simple  hash  join  [DEW84]  models  a  situation  where 
the  hash  table  does  not  fit  in  main  memory.  We  have  not  included  the  analysis  of  this  situa- 
tion because  our  preliminary  findings  indicate  that  the  resulting  performance  degradation 
will  be  similar  for  both  cases  being  compared. 

Let  the  two  relations  to  be  joined  be  Rl  and  R2,  and  let  their  sizes  (number  of  tuples) 
be  kl*B  and  k2*B,  respectively,  where  B  is  the  block  size  expressed  as  the  number  of  tuples 
in  a  block.  Assume  kl>k2.    Let  Tbr  be  the  time  to  input  a  block,  Th  the  time  for  hashing 
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the  value  of  an  attribute  over  which  the  join  is  to  be  performed,    Tw  the  time  to  write  a 
tuple  in  memory,  and  Tc  the  time  to  compare  a  hashed  value  with  values  in  the  stored  hash 
table.  Let  j  be  the  join  selectivity  defined  as 
j  =  (number  of  join  tuples  output)  /  kl*k2*B*B 

Using  values  in  DEW84,  we  set  values  of  9,  20  and  3  microseconds  for  Th,  Tw  and  Tc, 
respectively.  The  time  for  a  sequential  I/O  operation  was  set  at  10  milliseconds  per  page,  for 
a  page  size  of  40  tuples.  In  our  analysis,  the  block  size,  B,  is  a  parameter;  thus,  we  vary  the 
value  of  Tbr  from  5  milliseconds  (B  =  20)  to  25  milliseconds  (B  =  100).  For  blocks  of  shared 
results,  the  input  time  will  be  the  transfer  time  across  the  network.  We  assume  the  same 
values  for  the  transfer  time  as  for  the  sequential  I/O  operation.  We  assume  a  selectivity  fac- 
tor, s,  for  both  A-bf  and  A-fb  of  10  percent.  We  do  not  vary  s,  as  it  only  occurs  at  the  first 
level  and  its  effect  is  negligible. 

For  the  distributed  approach,  the  smaller  relation,  R2,  will  be  read  first,  hashed  and  the 
hash  table  is  stored  in  memory.  The  larger  relation,  Rl,  will  then  be  read,  hashed  and  com- 
pared with  the  stored  hash  table.    Note  that  a  20  percent  overhead  accommodates  the  extra     • 
comparisons  required  when  comparing  values  using  a  hash  table  [DEW84].    If  there  is  a 
match,  then  the  two  matching  tuples  will  be  output.  The  selection  only  occurs  at  level  1  and 
will  be  considered  as  part  of  the  input  time.    Any  final  projections  will  be  included  in  the    '■ 
time  to  move  the  join  output  tuples  to  the  buffer.   The  time  spent  to  transmit  the  final  result 
is  also  the  input  time  of  the  next  level  operation(s)  that  use  this  result.   However,  we  do  not 
assume  overlap  in  the  I/O  and  processing  times  of  an  operation.    The  execution  time  of  the    '.' 
primitive  operation  in  the  distributed  case  is  the  same  as  the  response  time,  since  output  is 
not  transmitted  until  processing  is  complete.   It  is  the  following: 
time  to  read,  hash  and  store  tuples  of  R2  + 

{  =  Tbr*k2  +  Th*k2*B  +  Tw*k2*B  } 
time  to  read,  hash  and  compare  tuples  of  Rl  + 
{  =  Tbr*kl  +  Th*kl*B  +Tc*kl*B*1.2  } 
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time  to  output  tuples  of  join  result 
{  =  Tw*2*j*B*B*kl*k2} 
In  the  pipelined  approach,  the  first  block  of  R2  will  be  read,  hashed  and  stored  in 
memory.  The  first  block  of  Rl  will  be  read,  hashed,  compared  with  the  current  contents  of 
the  hash  table  and  the  join  output;  i.e.,  the  pairs  of  matching  tuples  from  both  relations  will 
be  written  into  an  output  buffer.  Rl  will  also  be  stored  in  the  hash  table  for  further  com- 
parison with  subsequent  blocks  of  R2.  The  subsequent  blocks  of  Rl  and  R2  will  be  treated  in 
a  similar  fashion.  As  soon  as  the  number  of  tuples  in  the  output  buffer  exceeds  B,  a  block  of 
output  will  be  transmitted.  After  the  last  (k2-th)  block  of  R2  is  processed,  the  remaining 
blocks  of  Rl  need  not  be  stored  in  the  hash  table. 

For  each  block  i,  where  i  =  1,  .  .  .  ,  kl,  Tin-Rlj  and  Tin-R2i  is  the  time  to  read,  hash 
and  store  (optional)  blocks,  respectively.  Tcompj  is  the  time  spent  in  comparing  hashed 
values  with  the  hash  table  and  Tout;  is  the  time  spent  in  output  of  the  join  result.   For  i  =  1 
the  following  hold: 
Tin-Rlj -Tbr +Th*B +Tw*B; 
Tin-R2i  -Tbr  +  Th*B  +  Tw*B; 
Tcompi=Tc*B*1.2;  Toutj    =Tw*2*j*B*B 

For  subsequent  blocks  i  =  2,3,  .  .  .  ,  k2,  the  following  hold: 
Tin-Rlj  =Tin-Rlj;  Tin-R2i  =  Tin-R2i 
Tcompi   =Tc*2*B*1.2;   Tout;  =Tw*2*j*(2*i  -  1)*B*B 

For  blocks  i  =  k2+l,  .  .  .  ,  kl,  the  following  hold: 
Tin-Rlj  =  Tbr  +  Th*B;  Tin-R2i  =  0 
Tcompj   =Tc*B*1.2;      Toutj    =Tw*2*j*k2*B*B 

In  MIK86  and  SU86,  the  output  rate  of  the  pipelined  join  was  averaged  over  the  execu- 
tion time  to  obtain  an  average  rate  of  output.  This  is  not  an  accurate  model  for  accumula- 
tion type  operations  such  as  a  join,  for  which  as  more  input  blocks  are  accumulated,  a  single 
block  of  input  will  be  compared  against  several  blocks  and  the  number  of  output  tuples 
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produced  will  increase.  We  have  modeled  a  varying  output  rate  for  the  join  operation.  The 
following  discussion  elaborates  our  model.  In  any  data-flow  based  algorithm,  the  rate  of  out- 
put blocks  produced  by  an  operation  is  determined  by  the  availability  of  input,  as  long  as  the 
i-th  block  of  input  can  be  completely  processed  before  the  (i-M)-th  input  block  is  available; 
i.e.,  the  output  rate  is  determined  by  the  input  rate  (which  is  the  output  rate  of  the  previous 
operation  providing  the  input).  At  some  point,  the  input  blocks  are  available  at  a  faster  rate 
than  they  can  be  consumed.  This  is  the  critical  point  and  after  this  point,  the  output  rate  is 
determined  by  the  processing  rate  of  the  operation,  itself. 

Figure  8.3  (  or  Figure  8.5)  shows  that  a  sequence  of  operations,  [l]-3,  [2]-3,  .  .  .  ,  [m-l]-3, 
etc.,  controls  the  availability  of  input  for  [m]-l  and  can  be  considered  a  critical  path  for  [m]- 
1.  In  Figure  8.6,  we  have  a  sample  graph  describing  the  rate  of  producing  output  blocks  as  a 
function  of  the  number  of  input  blocks  consumed,  for  the  critical  path  operation  [2]-3  of  Fig- 
ure 8.3  The  relationship  between  the  number  of  input  blocks  consumed  (bi)  and  the  number 
of  output  blocks  produced  (bo)  is  the  following: 
j*bi*bi*B*B  =  bo*B  ifj*k2*k2*B*B  >  bo*B  -8.3 
j*bi*k2*B*B  =  bo*B     otherwise  -8.4 

The  time  for  operation  P  to  consume  (process)  bi  blocks  of  input  is  defined  as  Tproc(P,bi) 
and  the  time  to  produce  bo  blocks  of  output  is  defined  as  Tprod(P,bo).  Figure  8.6  shows 
plots  for  two  values  of  N  (the  number  of  tuples  in  the  database  relation  A-fi")  equal  to  10000 
and  200000  and  difi"erent  block  sizes  (B  =  20  and  B  =  40).  Our  results  show  that  the  output 
rate  increases  gradually  with  increasing  input  being  consumed.  Assuming  that  the  processing 
speeds  of  the  operations  along  the  pipeline  are  matched,  the  critical  point  for  an  operation,  at 
level  k,  corresponds  to  the  first  block  of  input  (be)  at  level  (k-1)  that  produces  more  than  one 
block  of  output.  The  critical  point  is  the  least  value  of  be  satisfying  the  following: 
j*(2*bc  -  1)*B*B  >  B 
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The  response  time  and  execution  time  for  operations  at  level  m  will  be  defined  recur- 
sively with  respect  to  operations  at  level  m-1  which  provide  input.  For  level  1  operations, 
these  values  can  be  obtained  directly  using  the  expressions  for  Tcompj,  Tout,  etc.  .    " '■ 

The  response  time  for  any  primitive  operation  P;,  Tres(Pi),  will  be  a  function  of  rl,  the 
number  of  input  blocks  needed  to  produce  the  first  output  block.  By  substituting  (bo=l)  in 
either  equation  8.3  or  8.4,  the  value  of  rl  (=bi)  can  be  obtained.  Tres(Pi)  is  also  a  function  of 
the  response  time  of  the  operation(s)  providing  input,  Pi_j  and  P._/.  Operation  P;  requires  rl 
blocks  of  input,  each,  from  P._j^  and  P._^',  to  produce  the  first  block  of  output.  In  all  our 
experiments,  the  rl  blocks  of  input  were  produced  before  the  critical  point;  thus,  the 
response  time  is  also  determined  by  the  output  rates  of  the  operations  Pi_^  and  P._  '.  The 
following  holds:  ■  '  ;.\i 

Tres(P;)  =  maximum  of  the  response  times  of  P-_^,  P.^'  ' : 

{max[Tres(Pi_j),Tres(Pi_/))}  ■:  -  .' 

+  maximum  time  for  P.^_^,  Pj_j'  to  produce  rl  blocks 

{  max  [  Tprod(Pi_i,rl),  Tprod(Pi_/,rl)  ]  } 

Note  that  this  time  must  be  adjusted  to  account  for  the  fact  that  the  first  block  is 

already  available. 
To  determine  the  execution  time  of  operation  Pi,  Texec(Pi),  we  first  determine  the  crit- 
ical point  of  operations,  P._j  and  Pj_j',  which  provide  input,  and  the  corresponding  output 
block,  pc  or  pc',  produced  at  the  critical  point.  Before  the  critical  point  the  output  rate  will  • 
be  controlled  by  Pi_j  (or  P;,/),  and  after  the  critical  point  the  output  rate  will  be  controlled 
by  operation  P;.  If  operation  Pi  processes  (consumes)  a  maximum  of  kl  blocks,  then  the  fol- 
lowing holds: 


■^■^1  ''' 
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Texec(Pi)  =  maximum  of  the  response  times  of  P-_^,  Pj_/ 
{  ma^  [  Tres(Pi_i),  TresfP;,/)  ]  } 
+  max.  time  for  Pj_j,  Pj_j'  to  produce  pc,  pc'  blocks,  respectively. 

{  max  [  Tprod(Pi_i,pc),  Tprod(Pi_/,pc')  ]  } 
+  time  for  Pj  to  process  a  max.  of  (kl-pc)  or  (kl-pc')  blocks 
{  Tproc(Pj,kl)  -  Tproc(Pj,pcmin)  };  pcmin  is  min  [pc,  pc'] 
This  expression  models  the  worst  case  situation  for  evaluating  Texec(P.). 
The  expression  Tproc(Pi,p),  for  any  operation  P-,  is  given  by  the  following: 


=p 


E  [  Tin-Rlj  +  Tin-R2i  +  Tcomp;  +  Tout;  ] 


i=l 


The  expression  TprodfPj.p),  for  any  operation  P-  at  level  1  is  given  by 

i-p' 

X;  [  Tin-Rl;  +  Tin-R2i  +  Tcompj  +  Toutj  ] 

i=l 

where  p'  is  the  number  of  input  blocks  consumed  to  produce  p  blocks  of  output.   For 

subsequent  levels, 

Tprod(Pi,p)  =  max  [  Tprod(Pi_i,p'),  Tprod(Pi_/,p')  ] 
We  study  the  performance  of  this  evaluation  strategy  with  and  without  pipelining,  with 
respect  to  three  parameters  for  the  transitive  closure  and  one  parameter  for  the  general 
Imear  recursion.  The  first  parameter  is  the  block  size,  B,  which  we  vary  from  20  to  100. 
The  value  of  Tbr  will  also  vary  correspondingly.  The  second  parameter  is  the  join  selectivity, 
j,  of  the  critical  path  operations  and  the  operations  that  produce  answers.  The  join  selec- 
tivity is  not  an  absolute  value  but  is  defined  as  a  ratio;  thus  we  vary  j  in  proportion  to  the 
sizes  of  the  input  relations  for  each  operation.  The  third  parameter  is  the  number  of  tuples, 
N,  of  the  initial  database  relation,  noted  A-fi".  We  vary  this  parameter  from  200000  to 
800000.  The  fourth  parameter  that  we  study  is  the  complexity  of  the  relational  algebraic  ' 
expressions  M,  D  or  P,  in  the  general  linear  recursion. 
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Figure  8.7  shows  the  response  time  and  the  execution  time  for  both  the  pipelined  and 
distributed  cases  as  a  function  of  the  depth  i  of  the  resolvents,  Ti-bb.  In  this  plot,  the  value 
of  N  is  200000,  B  is  20  and  j  is  5*10^.  This  value  of  j  (=1/N)  ensures  that  the  size  of  the 
output  produced  by  the  critical  operations  is  roughly  equal  at  different  levels  along  the  pipe- 
line. The  figure  shows  that  for  small  i,  with  pipelining,  the  response  time  is  much  less  than 
the  execution  time.  As  i  increases  these  two  curves  tend  to  move  closer.  The  reason  is  that 
for  small  i  there  are  less  operations  (and  delays)  along  the  critical  path  and  thus,  the  response 
time  is  small.  As  i  increases,  there  are  more  operations  (and  delays)  along  the  critical  path 
which  tend  to  increase  the  delay  in  producing  the  first  block  of  output  for  Ti-bb.  The  figure 
also  indicates  that  for  small  i,  the  execution  time  for  the  distributed  and  pipelined 
approaches  are  very  close  but  these  curves  tend  to  diverge  with  increasing  i.  This  is  because 
for  small  i,  there  are  fewer  operations  in  the  pipeline  and  the  advantage  of  pipelining  on  the 
execution  time  is  limited.  As  i  increases,  the  number  of  operations  in  the  pipeline  increase 
and  the  performance  improvement  increases  correspondingly. 

In  Figure  8.8,  we  show  the  effect  of  the  block  size,  B,  (or  operand  granularity)  on  the 
response  time.  We  plot  the  ratio  of  the  response  time  in  the  distributed  case  to  the  response 
time  with  pipelining,  as  a  function  of  the  block  size.  We  examine  three  resolvents,  T4-bb, 
T6-bb  and  T8-bb.  This  ratio,  which  is  proportional  to  the  performance  improvement  due  to 
pipelining,  is  largest  for  B=20,  and  gradually  decreases  with  increasing  block  size.  This  is 
true  for  all  resolvents.  The  reason  is  that  the  response  time  is  closely  related  to  the  block 
size.  The  number  of  tuples  consumed  to  produce  a  small  block  of  output  is  less,  and  this 
reduces  the  response  time.  As  the  block  size  increases,  more  input  tuples  have  to  be  con- 
sumed to  produce  the  first  block  and  the  response  time  increases. 

Figure  8.9  shows  the  effect  of  block  size  on  the  execution  time.  For  resolvents  T24-bb 
and  T32-bb,  we  plot  the  execution  time  of  the  distributed  case  and  the  pipelined  case,  as  a 
function  of  B.  The  execution  time,  which  is  the  time  to  complete  processing  all  blocks,  is  not 
as  sensitive  to  pipelining  as  the  response  time.    However,  with  increasing  values  for  B,  the 
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execution  time  reduces  slightly.  This  is  because  the  delays  along  the  critical  path  seem  to  be 
larger  for  smaller  block  size;  i.e.,  with  smaller  block  sizes,  the  input  rate  controls  (and  delays) 
the  pipeline  rate  along  the  critical  path  for  a  longer  time.  This  delay  has  a  corresponding 
effect  on  the  execution  time  and  the  execution  time  is  slightly  less  with  larger  block  size.  If 
we  assumed  a  penalty  for  transmitting  smaller  blocks,  then  the  execution  time  for  the  distri- 
buted case  would  also  reduce  slightly,  with  larger  block  size. 

Figure  8.10  shows  the  effect  of  the  join  selectivity  j,  on  the  response  time,  for  various 
resolvents.  The  ratio  of  the  response  time  in  the  distributed  case  to  the  response  time  with 
pipelining  is  plotted  as  a  function  of  j.  This  ratio  is  seen  to  increase  with  increasing  join  ■ 
selectivity  for  all  resolvents.  The  reason  is  that  with  increasing  values  of  j,  less  input  blocks 
have  to  be  consumed  to  produce  the  first  block  of  output.  As  a  result,  the  response  time  for 
the  pipelined  case  is  smaller. 

In  Figure  8.11,  we  show  the  effect  of  N,  the  number  of  tuples  of  the  database  relation, 
A-ff,  on  the  execution  time  for  the  distributed  and  pipelined  cases.    Note  that  the  value  of 
join  selectivity  is  varied  correspondingly;  this  ensures  that  the  size  of  the  answers  produced 
for  each   resolvent  is  proportional  to  the  size  of  the  input  relations  and  the  analysis  is  ' 
unbiased  by  arbitrary  sizes  of  the  output  relations.    The  execution  time  for  two  resolvents 
T24-bb  and  T32-bb  are  shown.    Each  of  these  plots  is  linear  in  N,  as  is  expected  since  the 
execution  time  must  be  proportional  to  the  size  of  input  relations  being  processed.  However, 
the  execution  time  for  the  distributed  case  increases  more  rapidly  with  increasing  N  as  com- 
pared to  the  pipelined  case,  resulting  in  enhanced  performance  due  to  pipelining.   The  reason 
is  that  after  the  initial  delays  in  setting  up  the  pipeline,  the  longer  the  pipeline  operates 
under  steady  state,  the  greater  the  benefit  of  pipelining.    With  increasing  N,  the  pipeline    ;: 
operates  longer  under  steady  state,  hence  the  improved  performance. 

Finally,  we  study  the  effect  of  introducing  some  complexity  to  the  primitive  operations 
executed  at  each  level  of  Figure  8.5,  to  obtain  answers  for  the  linear  recursive  query  Si-b. 
Suppose  that  M,  D  and  P  are  no  longer  extensional  predicates  but  that  each  corresponds  to  a 
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''•-.5.,  join  expression;  i.e.  input  blocks  have  to  be  read  from  two  database  relations  and  joined  to 

obtain  a  block  of  M,  D  or  P.  Now,  a  primitive  operation  such  as  [l]-2  of  Figure  8.5  will 
comprise  two  initial  joins  to  produce  blocks  for  M-bi  and  D-br  and  then  a  subsequent  join  of 

.'  ■  M-bi  and  D-br,  to  produce  the  output  for  operation   [ll-2.    The  number  of  input  blocks 

required  by  the  initial  join  operations,  (to  produce  a  single  block  of  M,  D  or  P),  can  be  used 
as  a  complexity  measure  for  each  primitive  operation. 

'  .',  Figure  8.12  shows  the  execution  time  for  the  distributed  and  the  pipelined  case  as  the 

complexity  of  the  primitive  operation  is  varied.  As  described  earlier,  in  the  pipelined  case,  a 
block  of  output  is  transmitted  as  soon  as  it  is  produced.   For  example,  operation  [l]-2  of  Fig- 

■;  ■  ure  8.5  produces  a  block  of  output  as  soon  as  it  accumulates  enough  input  to  (initially)  pro- 

\  duce  sufficient  blocks  of  both  M-bi  and  D-br  so  as  to  produce  one  block  of  the  final  output. 

...  Figure  8.12  shows  that  as  the  complexity  increases,  the  execution  time  for  the  distri- 

buted case  increases  much  more  rapidly  than  the  execution  time  for  the  pipelined  case.  This 
means  that  with  increasing  complexity,  the  enhancement  due  to  pipelining  increases.  The 
reason  is  that  as  the  complexity  increases,  the  processing  time  along  the  pipeline  also 
increases.   The  pipeline  operates  longer  under  steady  state  and  the  overlap  in  processing  time 

'.  provided  by  pipelining,  and  hence  the  enhancement,  is  greater.    The  enhancement  due  to 

.      .  pipelining  is  also  greater  with  increasing  depth,  i,  of  the  resolvent,  e.g.,  from  S8-b  to  Sl6-b. 

-.  •■  To  explain,  as  the  depth,  i,  of  the  resolvents  increases,  the  depth  of  operations  in  the  the 

pipeline  also  increases  and  with  it  the  enhancement  due  to  pipelining. 

8.6    Summarv  and  Evtensinns  to  the  Evalnafinn 

To  summarize,  in  this  chapter,  we  showed  the  benefits  that  derive  from  the  functional 

.  integration  of  a  DBMS  and  a  rule  processing  system  in  the  integrated  KBMS.   To  illustrate, 

:,y ,  we  presented  a  strategy  for  the  concurrent  evaluation  of  the  resolvents,  generated  by  a  linear 

recursive   query,   using  DBMS  query   processing  and  optimization   techniques.    Analytical 
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formulae  were  derived  for  the  response  times  and  execution  times  of  the  concurrent  queries 
and  the  performance  gain  due  to  pipelining  was  examined. 

To  summarize  the  results  of  our  analysis,  the  pipelined  approach  with  both  vertical  and 
horizontal  concurrency  always  performed  better  than  the  distributed  approach,  which  uses 
only  horizontal  concurrency;  both  approaches  used  intermediate  result  sharing.  The  effects 
of  pipelining  on  the  response  time  is  much  more  pronounced  as  compared  to  the  execution 
time;  this  is  as  expected  since  pipelining  inherently  produces  data  at  an  earlier  instant.  The 
effects  of  the  block  size,  B,  and  the  join  selectivity,  j,  on  the  response  time  are  similarly  ex- 
plained. The  advantages  of  pipelining  are  more  marked  with  longer  sequences  of  operations 
in  the  pipeline,  e.g.,  with  increasing  depth  i,  of  the  resolvents,  Ti-bb.  The  advantages  of 
pipelining  are  also  greater,  the  longer  the  pipeline  operates  under  the  steady  state  (after  the 
mitial  delays),  e.g.,  with  larger  database  relations  or  with  increasing  complexity  of  the  primi- 
tive operations  executed  in  the  pipeline. 

As  mentioned  earlier,  M,  D  and  P  could  be  intensional  predicates,  corresponding  to  re- 
lational algebra  expressions.  In  our  algorithm,  query  decomposition  assumed  that  the  evalua- 
tion of  each  of  these  predicates  was  an  indivisible  operation  and  we  shared  the  intermediate 
results  of  joins  using  these  predicates  as  operands.  As  the  complexity  of  the  expressions  that 
represent  these  predicates  increase,  it  may  be  advantageous  to  further  decompose  each  of 
these  predicates  as  this  may  provide  greater  opportunity  for  result  sharing.  However,  unlike 
the  decomposition  and  result  sharing  described  in  the  algorithm,  this  decomposition  and 
result  sharing  would  depend  on  the  actual  relational  algebra  expressions  corresponding  to  the 
predicates  and  the  specific  variable  bindings  involved. 

For  example,  if  we  consider  the  predicate  D  with  bindings  corresponding  to  b,  bi,  br, 
br',  etc.,  then  either  by  decomposing  the  literals  D-b,  D-bi,  D-br,  D-br',  etc.,  or  using  the 
literals  themselves,  there  may  be  opportunity  to  benefit  from  result  sharing.  Result  sharing 
such  as  this,  which  is  specific  to  a  query,  and  the  cost  benefit  introduced  by  this  optimization 
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has  been  reported  in  MIK86  and  SEL86.    We  have  not  considered  these  benefits  in  our 
evaluation  and  this  vk^ould  be  a  fruitful  topic  for  further  research. 
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Tl-bb  A(a,c) 

T2-bb  A(a,yl),  A(yl,c) 

T3-bb  A(a,y2),  A(y2,yl),  A(yl,c) 

T4-bb  A(a,y3),  A(y3,y2),  A(y2,yl),  A(yl,c) 

T5-bb         A(a,y4),  A(y4,y3),  A(y3,y2),  A(y2,yl),  A(yl,c) 

T6-bb  A(a,y5),  A(y5,y4),  A(y4,y3),  A(y3,y2),  A(y2,yl),  A(yl,c) 


Figure  8.1   Expressions  Corresponding  to  Resolvents  Ti-bb 
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Tl-bb 


[1]-1 
A-bb 


T2-bb 


T3-bb 


T4-bb 


T5-bb 


[l]-2 
(A-bf  JN  A-fb) 

[l]-3         [2]-l 
(  (A-bf  JN  A-ff)  JN  A-fb) 

common        [2]-2        [l]-4 
( (A-bf  JN  A-ff)  JN  (A-ff  JN  A-fb)  ) 
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Figure  8.2  Identifying  Parallel  Primitive  Operations  and  Common 
Sub-expressions  in  Evaluating  Resolvents  Ti-bb 
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Figure  8.3    Architecture  for  Evaluating  Transitive  Closure:  Parallel 
Operations,  Interconnections  and  Data  Flow 
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Figure  8.4  Evaluating  the  Resolvents  Generated  by  a  General 
Linear  Recursive  Clause 
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Figure  8.5    Architecture  for  Evaluating  the  Resolvents  Generated 
by  a  Linear  Recursive  Clause 
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Figure  8.7    Response  Time  and  Execution  Time  for  Distributed  and 
Pipelined  Cases  versus  the  Depth  i  of  Resolvents 
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Figure  8.8    Ratio  of  Response  Time  for  the  Distributed  and 
Pipelined  Cases  versus  the  Block  Size  B 
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Figure  8.10    Ratio  of  Response  Time  for  the  Distributed  and 
Pipelined  Cases  versus  the  Join  Selectivity  j 
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Figure  8.12    Execution  Time  for  the  Distributed  and  Pipelined 
Cases  versus  the  Complexity  C  of  Operations  at  each  Node 


CHAPTER  IX 
SUMMARY  AND  FUTURE  RESEARCH 

9.1    Summary 

To  summarize  the  research  presented  in  this  dissertation,  we  identified  several  key 
issues  in  the  design  of  a  KBMS.  The  design  of  the  knowledge  representation  model 
emphasized  both  integration  and  the  object-oriented  paradigm.  The  class-subclass  and  com- 
ponent hierarchies  of  objects  were  used  to  organize  rule-based  problem  solving  knowledge. 
The  feature  of  encapsulation  was  used  to  define  rules  for  the  object  types  and  integrated  the 
fact  base  and  the  rule  base  within  the  object  types  of  a  single  integrated  knowledge  base. 
Integration  was  also  emphasized  in  the  design  of  the  KML;  I<CML  constructs  were  used  to 
specify  operations  and  declarative  and  operational  semantics  of  knowledge  in  rules,  defined 
for  the  object  types. 

Processing  in  the  KBMS  has  a  flavor  of  both  transaction  oriented  DBMS  processing  and 
rule-based  AI  reasoning.  An  MME  cycle  applies  relevant  rules  while  executing  a  KBMS  tran- 
saction, comprising  KML  operations,  against  the  object  occurrences  in  the  knowledge  base. 
Rules,  expressed  in  the  KML  constructs,  modify  the  transaction.  Rules  are  incorporated  into 
the  transaction  when  explicitly  selected  for  execution.  Implicitly  selected  rules  are  treated  as 
independent  transactions. 

Common  characterization  of  the  DBMS  component  that  executes  operations  and  the  AI 
rule  processing  component  leads  to  functional  integration  and  the  successful  migration  of 
techniques  from  DBMS  technology  to  the  functionally  integrated  KBMS.  The  two  tech- 
niques we  investigated  and  applied  were  query  optimization  and  the  concurrent  execution  of 
transactions. 
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Query  optimization  was  applied  to  a  KBMS  transaction  evaluating  a  linear  recursive 
query.  The  set  of  retrieval  operations,  generated  by  the  recursive  rules,  was  optimized  using 
three  query  optimization  strategies  identified  for  use  in  DBMS  technology. 

Concurrent  execution  of  transactions  was  studied  in  the  context  of  implicitly  selected 
value  dependent  rules.  Each  rule  was  treated  as  an  independent  transaction  and  the  inter- 
leaved execution  of  the  set  of  transactions  was  studied. 

9.2    Future  Research 

It  is  evident  that  our  current  study  represents  a  first  pass  attempt  to  solve  the  complex 
task  of  integrating  AI  and  DBMS  technology.  Its  major  contribution  is  identifying  crucial 
features  in  the  design  of  a  KBMS  and  introducing  topics  for  future  research  that  are  critical 
to  the  integration  task  at  hand. 

Applying  rules  in  a  transaction  oriented  processing  environment,  represented  by  the 
MME  cycle,  is  a  novel  processing  paradigm  introduced  in  this  study.  We  characterize  rule 
processing  using  DBMS  retrieval  and  storage  manipulation  operations  and  modify  transac- 
tions using  operational  semantics  captured  in  rules.  An  added  feature  was  the  compiled  and 
interpretive  approach  to  applying  rules.  Consequently,  the  modified  transaction  was  not 
completely  static  during  execution  and  could  be  further  modified  while  evaluating  value 
dependent  rules. 

The  0PS5  system  used  in  the  prototype  simulation  of  the  MME  cycle  was  limited  in  its 
scope.  The  MME  cycle  must  be  simulated  in  a  transaction  oriented  processing  environment 
to  study  the  effectiveness  of  this  paradigm  and  the  benefits  of  query  optimization  and  con- 
current execution. 

The  migration  of  techniques  that  we  studied  has  relevance  and  potential  for  being  used 
outside  the  proposed  MME  cycle.  In  the  literature  review  in  Chapter  Two,  we  described  the 
interface   approach    to   integrating  AI   and   DBMS   technology.     Such   systems  could   also 
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potentially  benefit  from  migration.  An  interface  between  a  resolution  based  logic  program- 
ming system,  e.g.,  PROLOG,  and  a  transaction  oriented  relational  DBMS  generates  similar 
retrieval  queries  that  might  be  optimized  as  suggested  in  our  research.  A  similar  interface 
between  an  OPS5  style  production  system  and  a  DBMS  may  benefit  from  our  research  on 
the  interleaved  execution  of  a  set  of  concurrent  KBMS  transactions,  if  each  production  is 
treated  as  an  independent  transaction. 

The  design  of  a  KML  was  out  of  the  scope  of  our  study  but  is  essential  to  the  task  of 
integration.  In  addition  to  supporting  conventional  DML  constructs  in  an  object-oriented  en- 
vironment, the  design  of  the  KML  must  address  the  following  issues: 

(a)  parameter  passing  and  variable  binding  in  explicit  inference  chains 

(b)  variable  binding  in  implicitly  selected  rules. 

AI  technology  has  addressed  several  issues  in  meta-reasoning,  i.e.,  reasoning  about 
knowledge.  In  rule-based  systems,  meta-reasoning  includes  consistency  between  rules,  com- 
pleteness of  a  set  of  rules,  etc.  Current  research  on  these  issues  has  focused  on  rules  which 
capture  declarative  semantics,  alone.  In  our  knowledge  component,  we  include  an  operation- 
al component  of  triggering  and  scheduling  information,  procedural  methods,  etc.,  and  this 
could  complicate  studies  on  these  issues. 

The  problem  solving  knowledge  we  represent  using  the  KML  includes 

(a)  integrity,  security  and  other  constraints  that  must  be  maintained  between  object  oc- 
currences, 

(b)  deductive  rules  that  generate  new  occurrences  using  properties  such  as  transitivity  or 
domain  specific  knowledge  on  the  interaction  between  object  occurrences, 

(c)  domain  specific  expert  knowledge  applicable  to  object  occurrences,  etc. 

Clearly,  the  knowledge  relates  to  object  occurrences  in  the  knowledge  base  but  not  to 
abstract  entities  not  explicitly  stored  as  object  occurrences.  This  is  an  issue  that  must  be  ad- 
dressed in  the  design  of  our  knowledge  representation  model  and  KML,  in  order  to  make  an 
integrated  KBMS  truly  intelligent. 
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